Use Arthas to analyze Java online interface time consumption

Project scenario

In a production environment, it is often necessary to troubleshoot where in an interface is time-consuming. According to the previous practice, it may be by printing the logger, then patching and redeploying.

 @Test
 void testNeedTime() {<!-- -->
     long startTime = System.currentTimeMillis();

     AtomicInteger num1 = new AtomicInteger(1);
     IntStream.range(1,10).forEach(i-> num1.incrementAndGet());
     long endTime1 = System.currentTimeMillis();
     log.info("Time consuming: {}" , endTime1 - startTime);

     AtomicInteger num2 = new AtomicInteger(1);
     IntStream.range(1,100).forEach(i-> num2.incrementAndGet());
     long endTime2 = System.currentTimeMillis();
     log.info("Time consuming: {}" , endTime2 - endTime1 );

 }

If it is a project that can be updated frequently, or it is relatively easy to build, it is okay. If the project cannot be updated frequently or the deployment process is cumbersome, it will be a bit troublesome to patch it, so you need a platform that can support A tool that can be used to analyze timing problems with just a patch. Alibaba’s open source Arthas is a very good online problem analysis tool.

What are Arthas?

Arthas is an online monitoring and diagnosis platform that can view the status information of application load, memory, gc, and threads in real time. It can analyze timing problems, interface time-consuming, parameter transfer, exceptions, etc. without modifying the code, and improve the line. Improve problem troubleshooting efficiency

  • Official website documentation: https://arthas.aliyun.com/doc/

  • GitHub address: https://github.com/alibaba/arthas

  • Code cloud address: http://arthas.gitee.io/

Arthas installation

First download the corresponding arthas program locally. You can use the wget command, or you can use the curl introduced on the official website to access the corresponding link.

wget https://arthas.aliyun.com/arthas-boot.jar

Then install the documentation and start the corresponding jar

java -jar arthas-boot.jar --repo-mirror aliyun --use-http

It was found that an error was reported during startup, and the error was as follows:

Can not find java process. Try to run `jps` command lists the instrumented Java HotSpot VMs on the target system.
Please select an available pid.

as the picture shows:

The reason is that we need to add the corresponding thread id, but we do not run the corresponding Java program, so you can go to the official website to download an example of a Java program, math-game.jar.


Then first java -jar math-game.jar, run the corresponding Java program

View running Java programs:

ps -aux|grep java

You can find that the corresponding pid of math-game.jar is 2266, so execute it now

java -jar /usr/local/arthas/arthas-boot.jar 2266

Usage of trace command

The following is an example to analyze the slow loading problem of an interface.

The trace command can be used to analyze where it takes a long time. For example, if we want to analyze the listOrder method in a class OrderServiceImpl, we can use the command

trace com.sample.order.orderServiceImpl listOrder -n 1 --skipJDKMethod
false

-n specifies the number of times to capture results. If the interface is called more times, you can increase the number of times to capture results appropriately.

--skipJDKMethod false is to enable the printing of jdk methods. In fact, in general, it can be omitted because jdk methods are relatively stable.

How to wait for the interface to be called? You can also manually call the interface, and you can print out the following monitoring information. Specifically, you can locate the slow loading speed.

--[100.00% 3434.755668ms] org.springframework.cglib.proxy.MethodInterceptor:intercept()
            `---[100.00% 3434.620187ms] cn.test.server.business.VisitorBusiness:getStaffList()
                 + ---[0.00% 0.045431ms] cn.test.client.dto.StaffListReqDto:getComId() #188
                 + ---[0.00% 0.019135ms] cn.core.common.utils.CoreUtils:isEmpty() #188
                 + ---[0.00% 0.021816ms] cn.hutool.core.date.DateUtil:timer() #192
                 + ---[0.00% 0.019987ms] com.google.common.base.Splitter:on() #194
                 + ---[0.00% 0.005501ms] cn.test.client.dto.StaffListReqDto:getComId() #194
                 + ---[0.00% 0.026292ms] com.google.common.base.Splitter:splitToList() #194
                 + ---[0.00% 0.131507ms] cn.hutool.core.convert.Convert:toInt() #195
                 + ---[0.34% 11.668407ms] cn.core.user.client.querier.SchoolQuerierService:findBySchoolId() #198
                 + ---[0.00% 0.024199ms] cn.hutool.core.date.TimeInterval:intervalRestart() #203
                 + ---[0.02% 0.535107ms] org.slf4j.Logger:info() #203
                 + ---[2.66% 91.504718ms] cn.core.user.client.querier.RelationQuerierService:queryUserByIdsAndRoleType() #206
                 + ---[0.00% 0.019208ms] com.google.common.collect.Lists:newArrayList() #206
                 + ---[0.00% 0.01107ms] cn.core.common.utils.CoreUtils:isEmpty() #207
                 + ---[0.00% 0.013517ms] cn.hutool.core.date.TimeInterval:intervalRestart() #211
                 + ---[0.02% 0.532242ms] org.slf4j.Logger:info() #211
                 + ---[0.00% 0.016216ms] cn.hutool.core.date.TimeInterval:intervalRestart() #218
                 + ---[0.01% 0.435469ms] org.slf4j.Logger:info() #218
                 + ---[0.00% 0.009024ms] cn.test.client.dto.StaffListReqDto:getComId() #221
                 + ---[0.53% 18.336268ms]cn.test.persistence.dao.LogMapper:listStaffSyncRecordByComId() #221
                 + ---[0.00% 0.009043ms] com.google.common.collect.Lists:newArrayList() #221
                 + ---[0.00% 0.019237ms] cn.hutool.core.date.TimeInterval:intervalRestart() #223
                 + ---[0.01% 0.279834ms] org.slf4j.Logger:info() #223
                 + ---[0.00% 0.014057ms] cn.hutool.core.date.TimeInterval:intervalRestart() #227
                 + ---[0.01% 0.270155ms] org.slf4j.Logger:info() #227
                 + ---[0.00% 0.006338ms] com.google.common.collect.Lists:newArrayList() #231
                 + ---[0.00% 0.019707ms] cn.hutool.core.date.TimeInterval:intervalRestart() #263
                 + ---[0.02% 0.548875ms] org.slf4j.Logger:info() #263
                `---[0.00% 0.027475ms] cn.test.client.dto.Result:success() #265

If you want to print how many milliseconds it takes in the interface, you can use the command. The following is to filter out the ones that are longer than 100ms.

trace com.sample.order.orderServiceImpl listOrder '#cost > 100' -n 1