From discovery to optimization, a general idea to solve performance problems

1. Article overview

The technical system has a development process. In the early stage of business, it is mainly to realize business functions and goals. Since the amount of data and visits is not large, performance issues are not the primary consideration.

However, with the development of the business, as the data and traffic increase or even surge, it will cause problems such as the display of the homepage in five seconds. This poor experience will cause the loss of users. At this time, performance is a problem that must be faced. We divide the technical system into three stages: early stage, middle stage and late stage:

Early stage: Mainly realize business requirements, performance is not a key consideration
Mid-term: Performance problem annotations appear, affecting business development
Late stage: technical iteration performance and business must be considered at the same time

How to find performance problems, and how to finally solve performance problems is the main point of this article.

Second, what is performance

We can introduce what is performance from four dimensions:

Two dimensions define performance:

slow
high pressure

Two dimensions describe performance:

Qualitative: intuitive feeling
Quantitative: Indicator Analysis

3. Discover performance problems

1. Qualitative + speed

A page needs to be opened for a long time, a list can only be loaded slowly, and an interface access causes a timeout exception. These obvious problems can be classified into this category.

2. Quantitative + speed

1) Speed Indicator

A company has 7,200 employees. The clock-in time is from 8:00 am to 8:30 am every day. The system runs for 5 seconds each time the clock-in time is reached. What are the RT, QPS, and concurrency respectively?

RT means Response Time, the question already contains the answer:

RT = 5 seconds

QPS means the number of visits per second, assuming that the behavior is evenly distributed:

QPS = 7200 / (30 * 60) = 4

Concurrency indicates the number of requests processed by the system at the same time:

Concurrency = QPS x RT = 4 x 5 = 20

According to the above example, the formula is derived:

Concurrency = QPS x RT

2) QPS VS TPS

QPS (Queries Per Second): Queries per second

TPS (Transactions Per Second): the number of transactions per second

It should be noted that this transaction does not refer to a database transaction, but includes the following three stages:

receive request
handle business
return result

QPS = N * TPS (N>=1)

N=1 means that the interface has a transaction:

public class OrderService {




    public Order queryOrderById(String orderId) {
        return orderMapper. selectById(orderId);
    }
}

N>1 indicates that the interface has multiple transactions:

public class OrderService {




    public void updateOrder(Order order) {
        // transaction1
        orderMapper. update(order);
        // transaction2
        sendOrderUpdateMessage(order);
    }
}

3) Find the problem

①Print log

public class FastTestService {




    public void test01() {
        long start = System. currentTimeMillis();
        biz1();
        biz2();
        long costTime = System. currentTimeMillis() - start;
        System.out.println("costTime=" + costTime);
    }




    private void biz1() {
        try {
            System.out.println("biz1");
            Thread. sleep(500L);
        } catch (Exception ex) {
            log.error("error", ex);
        }
    }




    private void biz2() {
        try {
            System.out.println("biz2");
            Thread. sleep(1000L);
        } catch (Exception ex) {
            log.error("error", ex);
        }
    }
}

②StopWatch

import org.springframework.util.StopWatch;
import org.springframework.util.StopWatch.TaskInfo;




public class FastTestService {




    public void test02() {
        StopWatch sw = new StopWatch("testWatch");
        sw.start("biz1");
        biz1();
        sw. stop();




        sw.start("biz2");
        biz2();
        sw. stop();




        // Simple output takes time
        System.out.println("costTime=" + sw.getTotalTimeMillis());
        System.out.println();




        // output task information
        TaskInfo[] taskInfos = sw. getTaskInfo();
        for (TaskInfo task : taskInfos) {
            System.out.println("taskInfo=" + JSON.toJSONString(task));
        }
        System.out.println();




        // Format task information
        System.out.println(sw.prettyPrint());
    }
}

Output result:

costTime=1526




taskInfo={"taskName":"biz1","timeMillis":510,"timeNanos":510811200,"timeSeconds":0.5108112}
taskInfo={"taskName":"biz2","timeMillis":1015,"timeNanos":1015439700,"timeSeconds":1.0154397}




StopWatch 'testWatch': running time = 1526250900 ns
---------------------------------------------
ns % Task name
---------------------------------------------
510811200 033% biz1
1015439700 067% biz2

③trace

Arthas is Ali’s open source Java diagnostic tool:

Arthas is an online monitoring and diagnosis product. It can view the status information of application load, memory, gc, and thread in real time from a global perspective, and can diagnose business problems without modifying the application code, including checking the access of method calls. Parameters, exceptions, monitoring method execution time-consuming, class loading information, etc., greatly improving the efficiency of online problem troubleshooting

The Arthas trace command monitors the time spent on each node of the link:

https://arthas.aliyun.com/doc/trace.html

Let’s illustrate by example, first write and run the code:

package java.front.optimize;




public class FastTestService {




    public static void main(String[] args) {
        FastTestService service = new FastTestService();
        while (true) {
            service.test03();
        }
    }




    public void test03() {
        biz1();
        biz2();
    }




    private void biz1() {
        try {
            System.out.println("biz1");
            Thread. sleep(500L);
        } catch (Exception ex) {
            log.error("error", ex);
        }
    }




    private void biz2() {
        try {
            System.out.println("biz2");
            Thread. sleep(1000L);
        } catch (Exception ex) {
            log.error("error", ex);
        }
    }
}

The first step is to enter the arthas console:

$ java -jar arthas-boot.jar
[INFO] arthas-boot version: 3.6.2
[INFO] Found existing java process, please choose one and input the serial number of the process, eg : 1. Then hit ENTER
* [1]: 14121
  [2]: 20196 java.front.optimize.FastTestService

The second step is to enter the monitoring process number and press Enter

The third step trace command monitors the corresponding method:

trace java.front.optimize.FastTestService test03

The fourth step is to check the link time consumption:

`---[1518.7362ms] java.front.optimize.FastTestService:test03()
     + ---[33.66% 511.2817ms ] java.front.optimize.FastTestService:biz1() #54
    `---[66.32% 1007.2962ms ] java.front.optimize.FastTestService:biz2() #55

3. Qualitative + pressure

High system pressure will also show the characteristics of slow speed, but this kind of slowness is not only a few seconds before the webpage can be opened, but the webpage is always in the loading state and finally a white screen.

4. Quantitative + pressure

The common pressure indicators of the server are as follows:

Memory
CPU
disk
network

Server-side development is more likely to cause memory and CPU problems, so we focus on it.

1) CPU problem found

First write a piece of code that causes the CPU to soar and run it:

public class FastTestService {




    public static void main(String[] args) {
        FastTestService service = new FastTestService();
        while (true) {
            service. test();
        }
    }




    public void test() {
        biz();
    }




    private void biz() {
        System.out.println("biz");
    }
}

①dashboard + thread

The dashboard checks the real-time panel of the current system and finds that thread ID=1 CPU usage is very high (this ID cannot correspond to jstack nativeID):

$ dashboard
ID NAME GROUP PRIORI STATE %CPU DELTA TIME TIME INTERRU DAEMON
1 main main 5 RUNNA 96.06 4.812 2:41.2 false false

thread View the busiest top N threads:

$ thread -n 1




"main" Id=1 deltaTime=203ms time=1714000ms RUNNABLE
    at app//java.front.optimize.FastTestService.biz(FastTestService.java:83)
    at app//java.front.optimize.FastTestService.test(FastTestService.java:61)
    at app//java.front.optimize.FastTestService.main(FastTestService.java:17)

2) Found a memory problem

①free

$ free -h
              total used free shared buff/cache available
Mem: 10G 5.5G 3.1G 28M 1.4G 4.4G
Swap: 2.0G 435M 1.6G




total
total server memory




used
used memory




free
Free memory not used by any application




shared
shared physical memory




cache
IO device read cache (Page Cache)




buff
IO device write cache (Buffer Cache)




available
The memory that can be used by the program

②memory

Arthas memory command to view JVM memory information:

https://arthas.aliyun.com/doc/heapdump.html

View JVM memory information (official instance)

$ memory
Memory used total max usage
heap 32M 256M 4096M 0.79%
g1_eden_space 11M 68M -1 16.18%
g1_old_gen 17M 184M 4096M 0.43%
g1_survivor_space 4M 4M -1 100.00%
nonheap 35M 39M -1 89.55%
codeheap_'non-nmethods' 1M 2M 5M 20.53%
metaspace 26M 27M -1 96.88%
codeheap_'profiled_nmethods' 4M 4M 117M 3.57%
compressed_class_space 2M 3M 1024M 0.29%
codeheap_'non-profiled_nmethods' 685K 2496K 120032K 0.57%
mapped 0K 0K - 0.00%
direct 48M 48M - 100.00%

③jmap

Check the JAVA program process number

jps -l

View real-time memory usage

jhsdb jmap --heap --pid 20196

export snapshot file

jmap -dump:format=b,file=/home/tmp/my-dump.hprof 20196

Automatically export heap snapshots for memory overflow

-XX: + heapdumpOnOutOfMemoryError -XX:heapdumpPath==/home/tmp/my-dump.hprof

④heapdump

The Arthas heapdump command supports exporting heap snapshots:

https://arthas.aliyun.com/doc/heapdump.html

dump to specified file

heapdump /home/tmp/my-dump.hprof

dump live object to specified file

heapdump --live /home/tmp/my-dump.hprof

dump to a temporary file

heapdump

⑤Garbage collection

jstat can check the garbage collection status, and observe whether the program is frequently GC or whether the GC takes too long:

jstat -gcutil <pid> <interval(ms)>

Check garbage collection every second

$ jstat -gcutil 20196 1000
  S0 S1 E O M CCS YGC YGCT FGC FGCT CGC CGCT GCT
  0.00 0.00 57.69 0.00 - - 0 0.000 0 0.000 0 0.000 0.000
  0.00 0.00 57.69 0.00 - - 0 0.000 0 0.000 0 0.000 0.000
  0.00 0.00 57.69 0.00 - - 0 0.000 0 0.000 0 0.000 0.000

The parameters are described as follows:

S0: The ratio of the Survivor 0 area to the used space in the new generation

S1: The ratio of the Survivor 1 area to the used space in the new generation

E: The ratio of the new generation to the used space

O: The ratio of the old generation to the used space

P: The percentage of the permanent zone to the used space

YGC: the number of Young GC occurrences since the application was started

YGCT: The time (seconds) spent by Young GC since the application was started

FGC: The number of times Full GC has occurred since the application was started

FGCT: The time (seconds) spent by Full GC since the application was started

GCT: The total time of garbage collection used since the application was started (seconds)

3) Comprehensively find problems

①Stress test

System pressure testing can actively expose system problems and evaluate system capacity. The simple and commonly used parameters are as follows:

Common tools: JMeter
Step pressure: the number of threads 10, 20, 30 increases to the bottleneck
Duration: Lasts 1 minute, Ramp-Up=0
TPS: Throughput
Response time: focus on 95Line

②Monitoring system

The monitoring system can display relevant indicators in a more friendly manner. If the company has certain technical strength, it can develop by itself, otherwise it can choose to use the industry’s common solution.

Fourth, optimize performance issues

1. Four methods

reduce request
space for time
task parallelization
task asynchronization

2. Five levels

proxy layer
front-end layer
service layer
caching layer
data layer

3. Optimization instructions

When it comes to performance optimization, it is not difficult to think of solutions such as indexing and caching. This may be correct, but thinking this way may cause omissions, because this is only a solution for the cache layer and data layer.

If invalid traffic can be rejected at the outermost layer, then this is a better protection for the system. Four methods can be applied at each level, let us give some examples:

1) Reduce requests + front-end layer

Set the pre-verification code in the seckill scenario

2) Reduce request + service layer

Can multiple RPCs be converted to one batch RPC

3) Space for time + service layer

Introduce caching

4) Space for time + cache layer

Introduce multi-level caching

5) Space for time + data layer

new index

6) Task parallelization + service layer

If multiple calls are independent of each other, use Future parallelization

7) Task asynchronization + service layer

If you don’t need to wait for the return result, you can execute it asynchronously

V. Article Summary

First, this article discusses how to view performance problems in the early, middle, and later stages of the system. Second, it discusses what performance is. Third, it discusses how to optimize performance problems. I hope this article will be helpful to everyone.

Author丨IT Fatty Xu

Source丨Public account: JAVA Frontline (ID: www_xpz)