Remember the jvm tuning process in detail

Cpu usage is too high

If the CPU usage is too high, we need to discuss it on a case-by-case basis. Is it because there is a business activity, and suddenly a large amount of traffic comes in, and the CPU usage drops after the activity is over. If this is the case, you don’t need to worry too much, because the more requests there are, the more requests there are. , the more threads need to be processed, this is a normal phenomenon. Having said that, if your server configuration itself is poor and the CPU only has one core, in this case, a little more traffic can really exhaust your CPU resources. At this time, you should consider upgrading the configuration first.

In the second case, the CPU usage is too high for a long time. In this case, your program may have code that has a lot of loops, or even an infinite loop. The troubleshooting steps are as follows:

(1) Use the top command to check the CPU usage:


In this way, the process with excessive CPU can be located. Under Linux, the process number obtained by the top command and the vmid obtained by the jps tool are the same:

(2) Use the top -Hp command to check the thread status


You can see that the thread with thread ID 7287 has been occupying the CPU.
(3) Convert the thread number to hexadecimal

Write down this hexadecimal number, we will use it next

(4) Use the jstack tool to view the thread stack situation

[root@localhost ~]# jstack 7268 | grep 1c77 -A 10
"http-nio-8080-exec-2" #16 daemon prio=5 os_prio=0 tid=0x00007fb66ce81000 nid=0x1c77 runnable [0x00007fb639ab9000]
   java.lang.Thread.State: RUNNABLE
 at com.spareyaya.jvm.service.EndlessLoopService.service(EndlessLoopService.java:19)
 at com.spareyaya.jvm.controller.JVMController.endlessLoop(JVMController.java:30)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:190)
 at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138)
 at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:105)

Use the jstack tool to output the current thread stack, and then use the grep command to locate the running status of this thread based on the thread’s hexadecimal ID obtained in the previous step. The 7268 after jstack is the process number located in step (1). What follows grep is the thread number located in steps (2) and (3).

From the output result, you can see that this thread is running. When executing the method com.spareyaya.jvm.service.EndlessLoopService.service, the code line number is line 19. In this way, you can go to line 19 of the code and find the code where it is located. block to see if it is in a loop, thus locating the problem.

Deadlock

Deadlock is not as obvious as the first scenario. The web application must be a multi-threaded program, which serves multiple requests. After the program deadlocks, the deadlocked thread is in a waiting state (WAITING or TIMED_WAITING), and the thread in the waiting state It does not occupy the CPU and consumes very limited memory. However, the performance may be that the request cannot be processed and eventually times out. When there are not many deadlock situations, this situation is not easy to detect.

You can use the jstack tool to view

(1) jps view java process

[root@localhost ~]# jps -l
8737 sun.tools.jps.Jps
8682 jvm-0.0.1-SNAPSHOT.jar

(2) jstack checks the deadlock problem

Since web applications often have many working threads, especially in high concurrency situations, the output of this command will be very large. The biggest advantage of jstack is that it will output the information about the deadlock (including which thread caused it) to the end, so we only need to look at the last content.

Java stack information for the threads listed above:
================================================== =
"Thread-4":
 at com.spareyaya.jvm.service.DeadLockService.service2(DeadLockService.java:35)
 - waiting to lock <0x00000000f5035ae0> (a java.lang.Object)
 - locked <0x00000000f5035af0> (a java.lang.Object)
 at com.spareyaya.jvm.controller.JVMController.lambda$deadLock$1(JVMController.java:41)
 at com.spareyaya.jvm.controller.JVMController$$Lambda$457/1776922136.run(Unknown Source)
 at java.lang.Thread.run(Thread.java:748)
"Thread-3":
 at com.spareyaya.jvm.service.DeadLockService.service1(DeadLockService.java:27)
 - waiting to lock <0x00000000f5035af0> (a java.lang.Object)
 - locked <0x00000000f5035ae0> (a java.lang.Object)
 at com.spareyaya.jvm.controller.JVMController.lambda$deadLock$0(JVMController.java:37)
 at com.spareyaya.jvm.controller.JVMController$$Lambda$456/474286897.run(Unknown Source)
 at java.lang.Thread.run(Thread.java:748)

Found 1 deadlock.

A deadlock was discovered and the cause was clear.

Memory leak

We all know that the biggest difference between java and c++ is that the former will automatically reclaim memory that is no longer used, while the latter requires the programmer to manually release it. In C++, memory leaks occur if we forget to release memory. However, don’t think that memory leaks will not occur if the jvm helps us reclaim the memory.

After a memory leak occurs in a program, the available memory of the process will gradually decrease, and the final result will be an OOM error. After an OOM error occurs, you may think that the memory is not large enough, so you increase the -Xmx parameter and restart the application. The result of this is that OOM will still appear after a period of time. In the end, it was no longer possible to increase the maximum heap memory, and the result was that the application could only be restarted every once in a while.

Another possible symptom of a memory leak is longer response times for requests. This is caused by frequent GC that will suspend all other threads (Stop The World).

To simulate this scenario, the following procedure was used

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class Main {<!-- -->
    public static void main(String[] args) {<!-- -->
        Main main = new Main();
        while (true) {<!-- -->
            try {<!-- -->
                Thread.sleep(1);
            } catch (InterruptedException e) {<!-- -->
                e.printStackTrace();
            }
            main.run();
        }
    }

    private void run() {<!-- -->
        ExecutorService executorService = Executors.newCachedThreadPool();
        for (int i = 0; i < 10; i + + ) {<!-- -->
            executorService.execute(() -> {<!-- -->
                // do something...
            });
        }
    }
}

The running parameters are -Xms20m -Xmx20m -XX: + PrintGC, reduce the available memory, and output information when gc occurs. The running results are as follows:

...
[GC (Allocation Failure) 12776K->10840K(18432K), 0.0309510 secs]
[GC (Allocation Failure) 13400K->11520K(18432K), 0.0333385 secs]
[GC (Allocation Failure) 14080K->12168K(18432K), 0.0332409 secs]
[GC (Allocation Failure) 14728K->12832K(18432K), 0.0370435 secs]
[Full GC (Ergonomics) 12832K->12363K(18432K), 0.1942141 secs]
[Full GC (Ergonomics) 14923K->12951K(18432K), 0.1607221 secs]
[Full GC (Ergonomics) 15511K->13542K(18432K), 0.1956311 secs]
...
[Full GC (Ergonomics) 16382K->16381K(18432K), 0.1734902 secs]
[Full GC (Ergonomics) 16383K->16383K(18432K), 0.1922607 secs]
[Full GC (Ergonomics) 16383K->16383K(18432K), 0.1824278 secs]
[Full GC (Allocation Failure) 16383K->16383K(18432K), 0.1710382 secs]
[Full GC (Ergonomics) 16383K->16382K(18432K), 0.1829138 secs]
[Full GC (Ergonomics) Exception in thread "main" 16383K->16382K(18432K), 0.1406222 secs]
[Full GC (Allocation Failure) 16382K->16382K(18432K), 0.1392928 secs]
[Full GC (Ergonomics) 16383K->16382K(18432K), 0.1546243 secs]
[Full GC (Ergonomics) 16383K->16382K(18432K), 0.1755271 secs]
[Full GC (Ergonomics) 16383K->16382K(18432K), 0.1699080 secs]
[Full GC (Allocation Failure) 16382K->16382K(18432K), 0.1697982 secs]
[Full GC (Ergonomics) 16383K->16382K(18432K), 0.1851136 secs]
[Full GC (Allocation Failure) 16382K->16382K(18432K), 0.1655088 secs]
java.lang.OutOfMemoryError: Java heap space

It can be seen that although gc is being used, more and more memory is occupied, indicating that some objects in the program cannot be recycled. However, the above program objects are all defined within methods and belong to local variables. After the method runs, the objects referenced by local variables should be recycled during gc, but this is obviously not the case here.

In order to find out which objects failed to be recycled, we added the operating parameters -XX: + HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=heap.bin, which means to dump the heap memory information when OOM occurs. Run the program until there is an exception, so we get the heap.dump file, and then we use the MAT plug-in of eclipse to analyze it. If it is not installed, you need to install it first.

Then File->Open Heap Dump…, then select the file you just dumped, and select Leak Suspects


MAT will list all objects that may have memory leaks


You can see that there are actually 21260 Thread objects and 3386 ThreadPoolExecutor objects. If you look at the source code of java.util.concurrent.ThreadPoolExecutor, you can find that the thread pool will continue to wait for new tasks in order to reuse threads, and the threads will not It will be recycled, and its shutdown() method needs to be called to stop the thread pool after completing the task.

In fact, the thread pool is defined as a local variable. A good practice is to set it as a singleton.

The above is just one of the processing methods

For online applications, the memory is often set to a large size. In this case, the memory snapshot dumped file will be very large when OOM occurs. It may be so large that it cannot be analyzed on the local computer (because there is not enough memory to open the dump file. ). Here is another way to deal with it:

(1) Use jps to locate the process number

C:\Users\spareyaya\IdeaProjects\maven-project\target\classes\org\example\
et>jps -l
24836 org.example.net.Main
62520 org.jetbrains.jps.cmdline.Launcher
129980 sun.tools.jps.Jps
136028 org.jetbrains.jps.cmdline.Launcher

Because we already know which application caused OOM, we can directly use jps to find the process number 135988.

(2) Use jstat to analyze gc activity

jstat is a tool that counts Java process memory usage and gc activities. There can be many parameters. You can view all parameters and their meanings through jstat -help.

C:\Users\spareyaya\IdeaProjects\maven-project\target\classes\org\example\
et>jstat -gcutil -t -h8 24836 1000
Timestamp S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
           29.1 32.81 0.00 23.48 85.92 92.84 84.13 14 0.339 0 0.000 0.339
           30.1 32.81 0.00 78.12 85.92 92.84 84.13 14 0.339 0 0.000 0.339
           31.1 0.00 0.00 22.70 91.74 92.72 83.71 15 0.389 1 0.233 0.622

The above is the command meaning to output gc, the output time, one line header information is output every 8 lines, the statistical process number is 24836, and the information is output every 1000 milliseconds.

The output information is that Timestamp is the time since the jvm started, S0, S1, E are the two Survivors and Eden of the new generation, O is the old generation area, M is the Metaspace, CCS uses the compression ratio, YGC and YGCT are the new generation gc respectively. The number and time, FGC and FGCT are the number and time of gc in the old generation respectively, and GCT is the total time of gc. Although gc has occurred, the memory usage in the old generation has not dropped at all, indicating that some objects cannot be recycled (of course, it does not rule out that these objects are really useful).

(3) Use the jmap tool to dump the memory snapshot

jmap can dump the memory snapshot of the specified java process. The effect is the same as the first method. The difference is that it can do it without waiting for OOM, and the dumped snapshot will be much smaller.

jmap -dump:live,format=b,file=heap.bin 24836

At this time, you will get the memory snapshot file of heap.bin, and then you can use eclipse to analyze it.

Summarize

Strictly speaking, the above three methods are not JVM tuning. They just use JVM tools to find problems in the code. The main purpose of our jvm is to minimize the pause time and improve the throughput of the system.

But if we blindly set the parameters without analyzing the system, we may get worse results. With the development of jvm today, various default parameters may be balanced by people in the laboratory after many tests. Suitable for most application scenarios.

If you think your jvm really needs to be tuned, you must also take samples and analyze it. In the end, you have to adjust it slowly and multiple times to get better results.