Improper new ArrayList causes CPU spike. .

33b90b8caa4e494b366195fad8f7103c.png

Source: juejin.cn/post/7139202066362138654

  • Preface

  • The scene at that time

  • Normal jvm monitoring curve chart

  • The jvm monitoring curve that caused the problem

  • Specific analysis

  • Conclusion

Yesterday, the CPU of the online container suddenly surged. This is the first time to troubleshoot this problem, so I will record it~

Foreword

First of all, the problem is this. I was writing a document on Friday when I suddenly received an online alarm. I found that the CPU usage reached more than 90%. I went to the platform monitoring system to check the container. In the jvm monitoring, I found that a pod generated 61% in two hours. Once youngGc and once fullGc, this problem is very serious and rare. Since I have never checked this kind of problem before, it is also Baidu. However, I also have some thoughts on the whole process, so I will share it with you~

The scene

Let me first show you a normal gc curve monitoring (for confidentiality, I drew it myself according to the platform monitoring):

Normal jvm monitoring curve graph

288ea09833e340d8e74db712e52b13e3.jpegNormal jvm monitoring Graph

JVM monitoring curve chart that caused the problem

93d4c500689579b407c6b0a24d3e1faf.jpegjvm that caused the problem Monitoring curve graph

It can be seen that under normal circumstances, the system has very few gc (depending on the business system usage and jvm memory allocation), but in Figure 2, a large number of abnormal gc situations even triggered fullGc, so I immediately conducted an analysis .

Detailed analysis

First of all, abnormal GC only occurs on one pod (the system has multiple pods). Find the corresponding pod in the monitoring system, enter the pod to check the cause of the problem, and be calm when troubleshooting.

  1. After entering the pod, enter top to check the usage of system resources by each Linux process (because I am making an afterthought, the resource usage is not high, you can just follow the steps)

acb27eed953ecb79674ba873543ec003.jpeg

picture
  1. Analyze resource usage in the context of the situation

41a3a3232628c55c082b9cdcb6410270.jpegtop

At that time, the CPU of my process with pid 1 reached 130 (multi-core). Then I decided that there was a problem with the Java application. Control + c to exit and continue.

  1. Enter top -H -p pid. You can use this command to view the id of the thread that actually occupies the highest CPU. The pid is the pid number with the highest resource usage just now.

42bda96c0d1ca1ad18296bab37503dfc.jpegtop -H -p pid

  1. The resource usage of a specific thread appears. The pid in the table represents the id of the thread. We call it tid.

7cfb8465d696b296be0bfb2f92e2425e.jpegtid

  1. I remember that the tip at that time was 746 (the above picture is just me repeating the steps for everyone). Use the command printf “%x\\
    ” 746 to convert the thread tid to hexadecimal.

5fb811583f70bd8c78776e9b69fc9af3.jpegtid converted to 16 base

Because our thread ID number is in hexadecimal in the stack, we need to do a hexadecimal conversion.

  1. Enter jstack pid | grep 2ea >gc.stack

jstack pid | grep 2ea >gc.stack

dc33b50f8e0bc4a5c4269c3f8dc8a27c.jpegjstack

To explain, jstack is one of the monitoring and tuning gadgets provided by jdk. jstack will generate a thread snapshot of the JVM at the current moment. Then we can use it to view the thread stack information in a certain Java process. After that, we pass the stack information through the pipeline Collect the information of the 2ea thread, and then generate the information into a gc.stack file. I created it casually.

  1. At that time, I first cat gc.stack and found that there was a lot of data and it was difficult to see in the container, so I downloaded it and browsed it locally. Because the company restricted access to each machine, I could only use the springboard machine to find a useless machine first. a, download the file to a and then download the file in a to the local (local access to the springboard machine is OK), first enter python -m SimpleHTTPServer 8080, Linux comes with python, this is to open a simple http service for external access

886af906446e0209202e0f22ea698d0e.jpegEnable http service

Then log in to the spring machine and use curl to download curl -o http://ip address/gcInfo.stack

For the convenience of demonstration, I replaced the IP address with a fake one in the picture.

89767e0827e250e07f808cce9850b611.jpegcurl

Then use the same method to download the springboard machine locally. Remember to turn off the suggestion service enabled by python.

  1. Download the file locally, open the view editor and search for 2ea, and find the stack information with nid 2ea.

a925ee7de035fdf53985b063bddad2cf.jpegFind the stack information with nid 2ea

Then find the corresponding impl analysis program based on the number of lines

  1. It was found that when the file was asynchronously exported to Excel, the export interface used the public list query interface. The list interface query data can be paged in batches of 200 at most, and the amount of exported data allows each person to have permissions ranging from tens of thousands to hundreds of thousands.

5a4498a921509caa4eca13830db93172.jpegExport excel

And this judgment method uses nested loop judgment, and it is easy to get the value when combined with the business. The new ArrayList under Java returns a List collection (it seems needless to say so detailed (; 1_1)), before the entire method ends , the life cycle of the generated lists is still there, so after multiple GC triggers and restarts, it also affects other pods. Then the code was fixed and went online urgently, and the problem was solved~

Conclusion

Don’t be afraid when you encounter production problems. When encountering a problem, first ensure that the service is available, and then analyze the limited information layer by layer to find out the final problem. If you know arthas, troubleshooting will be easier!





PS: If you think my sharing is good, you are welcome to like it and read it.


Follow the public account: Java back-end programming and reply with the following keywords


To learn the complete route of Java, reply Route

Lacking Java introductory video, reply: Video

If you want Java interview experience, reply Interview

Missing Java project, reply: project

Join the Java fan group: Join the group


PS: If you think my sharing is good, you are welcome to like it and read it.


(over)

Add me”WeChat” to get a copy of the latest Java interview questions

37a44b34bc73c6a439496d8e1b177edc.jpeg

Please note: 666, otherwise it will not pass~

Good articles recently

1. Goodbye Shiro!

2. I accidentally discovered the database of a Tsinghua girl!

3. Spring Boot + Gzip compresses very large JSON objects, reducing the transmission size by half!

4. Keep these 16 SpringBoot extension interfaces in mind to write more beautiful code

5. Let’s talk about how major manufacturers prevent repeated orders?


969049a09bcb2e5262bf9ae20af735d4.jpeg


I recently interviewed BAT and compiled an interview material "Java Interview BAT Clearance Manual", which covers Java core technology, JVM, Java concurrency, SSM, microservices, databases, data structures, etc.
How to get it: Follow the official account and reply to java to get it. More content will be provided one after another.
See you tomorrow (ω)