1. Article overview
The technical system has a development process. In the early stage of business, it is mainly to realize business functions and goals. Since the amount of data and visits is not large, performance issues are not the primary consideration.
However, with the development of the business, as the data and traffic increase or even surge, it will cause problems such as the display of the homepage in five seconds. This poor experience will cause the loss of users. At this time, performance is a problem that must be faced. We divide the technical system into three stages: early stage, middle stage and late stage:
-
Early stage: Mainly realize business requirements, performance is not a key consideration
-
Mid-term: Performance problem annotations appear, affecting business development
-
Late stage: technical iteration performance and business must be considered at the same time
How to find performance problems, and how to finally solve performance problems is the main point of this article.
Second, what is performance
We can introduce what is performance from four dimensions:
Two dimensions define performance:
-
slow
-
high pressure
Two dimensions describe performance:
-
Qualitative: intuitive feeling
-
Quantitative: Indicator Analysis
3. Discover performance problems
1. Qualitative + speed
A page needs to be opened for a long time, a list can only be loaded slowly, and an interface access causes a timeout exception. These obvious problems can be classified into this category.
2. Quantitative + speed
1) Speed Indicator
A company has 7,200 employees. The clock-in time is from 8:00 am to 8:30 am every day. The system runs for 5 seconds each time the clock-in time is reached. What are the RT, QPS, and concurrency respectively?
RT means Response Time, the question already contains the answer:
-
RT = 5 seconds
QPS means the number of visits per second, assuming that the behavior is evenly distributed:
-
QPS = 7200 / (30 * 60) = 4
Concurrency indicates the number of requests processed by the system at the same time:
-
Concurrency = QPS x RT = 4 x 5 = 20
According to the above example, the formula is derived:
-
Concurrency = QPS x RT
2) QPS VS TPS
QPS (Queries Per Second): Queries per second
TPS (Transactions Per Second): the number of transactions per second
It should be noted that this transaction does not refer to a database transaction, but includes the following three stages:
-
receive request
-
handle business
-
return result
QPS = N * TPS (N>=1)
N=1 means that the interface has a transaction:
public class OrderService { public Order queryOrderById(String orderId) { return orderMapper. selectById(orderId); } }
N>1 indicates that the interface has multiple transactions:
public class OrderService { public void updateOrder(Order order) { // transaction1 orderMapper. update(order); // transaction2 sendOrderUpdateMessage(order); } }
3) Find the problem
①Print log
public class FastTestService { public void test01() { long start = System. currentTimeMillis(); biz1(); biz2(); long costTime = System. currentTimeMillis() - start; System.out.println("costTime=" + costTime); } private void biz1() { try { System.out.println("biz1"); Thread. sleep(500L); } catch (Exception ex) { log.error("error", ex); } } private void biz2() { try { System.out.println("biz2"); Thread. sleep(1000L); } catch (Exception ex) { log.error("error", ex); } } }
②StopWatch
import org.springframework.util.StopWatch; import org.springframework.util.StopWatch.TaskInfo; public class FastTestService { public void test02() { StopWatch sw = new StopWatch("testWatch"); sw.start("biz1"); biz1(); sw. stop(); sw.start("biz2"); biz2(); sw. stop(); // Simple output takes time System.out.println("costTime=" + sw.getTotalTimeMillis()); System.out.println(); // output task information TaskInfo[] taskInfos = sw. getTaskInfo(); for (TaskInfo task : taskInfos) { System.out.println("taskInfo=" + JSON.toJSONString(task)); } System.out.println(); // Format task information System.out.println(sw.prettyPrint()); } }
Output result:
costTime=1526 taskInfo={"taskName":"biz1","timeMillis":510,"timeNanos":510811200,"timeSeconds":0.5108112} taskInfo={"taskName":"biz2","timeMillis":1015,"timeNanos":1015439700,"timeSeconds":1.0154397} StopWatch 'testWatch': running time = 1526250900 ns --------------------------------------------- ns % Task name --------------------------------------------- 510811200 033% biz1 1015439700 067% biz2
③trace
Arthas is Ali’s open source Java diagnostic tool:
Arthas is an online monitoring and diagnosis product. It can view the status information of application load, memory, gc, and thread in real time from a global perspective, and can diagnose business problems without modifying the application code, including checking the access of method calls. Parameters, exceptions, monitoring method execution time-consuming, class loading information, etc., greatly improving the efficiency of online problem troubleshooting
The Arthas trace command monitors the time spent on each node of the link:
https://arthas.aliyun.com/doc/trace.html
Let’s illustrate by example, first write and run the code:
package java.front.optimize; public class FastTestService { public static void main(String[] args) { FastTestService service = new FastTestService(); while (true) { service.test03(); } } public void test03() { biz1(); biz2(); } private void biz1() { try { System.out.println("biz1"); Thread. sleep(500L); } catch (Exception ex) { log.error("error", ex); } } private void biz2() { try { System.out.println("biz2"); Thread. sleep(1000L); } catch (Exception ex) { log.error("error", ex); } } }
The first step is to enter the arthas console:
$ java -jar arthas-boot.jar [INFO] arthas-boot version: 3.6.2 [INFO] Found existing java process, please choose one and input the serial number of the process, eg : 1. Then hit ENTER * [1]: 14121 [2]: 20196 java.front.optimize.FastTestService
The second step is to enter the monitoring process number and press Enter
The third step trace command monitors the corresponding method:
trace java.front.optimize.FastTestService test03
The fourth step is to check the link time consumption:
`---[1518.7362ms] java.front.optimize.FastTestService:test03() + ---[33.66% 511.2817ms ] java.front.optimize.FastTestService:biz1() #54 `---[66.32% 1007.2962ms ] java.front.optimize.FastTestService:biz2() #55
3. Qualitative + pressure
High system pressure will also show the characteristics of slow speed, but this kind of slowness is not only a few seconds before the webpage can be opened, but the webpage is always in the loading state and finally a white screen.
4. Quantitative + pressure
The common pressure indicators of the server are as follows:
-
Memory
-
CPU
-
disk
-
network
Server-side development is more likely to cause memory and CPU problems, so we focus on it.
1) CPU problem found
First write a piece of code that causes the CPU to soar and run it:
public class FastTestService { public static void main(String[] args) { FastTestService service = new FastTestService(); while (true) { service. test(); } } public void test() { biz(); } private void biz() { System.out.println("biz"); } }
①dashboard + thread
The dashboard checks the real-time panel of the current system and finds that thread ID=1 CPU usage is very high (this ID cannot correspond to jstack nativeID):
$ dashboard ID NAME GROUP PRIORI STATE %CPU DELTA TIME TIME INTERRU DAEMON 1 main main 5 RUNNA 96.06 4.812 2:41.2 false false
thread View the busiest top N threads:
$ thread -n 1 "main" Id=1 deltaTime=203ms time=1714000ms RUNNABLE at app//java.front.optimize.FastTestService.biz(FastTestService.java:83) at app//java.front.optimize.FastTestService.test(FastTestService.java:61) at app//java.front.optimize.FastTestService.main(FastTestService.java:17)
2) Found a memory problem
①free
$ free -h total used free shared buff/cache available Mem: 10G 5.5G 3.1G 28M 1.4G 4.4G Swap: 2.0G 435M 1.6G total total server memory used used memory free Free memory not used by any application shared shared physical memory cache IO device read cache (Page Cache) buff IO device write cache (Buffer Cache) available The memory that can be used by the program
②memory
Arthas memory command to view JVM memory information:
https://arthas.aliyun.com/doc/heapdump.html
-
View JVM memory information (official instance)
$ memory Memory used total max usage heap 32M 256M 4096M 0.79% g1_eden_space 11M 68M -1 16.18% g1_old_gen 17M 184M 4096M 0.43% g1_survivor_space 4M 4M -1 100.00% nonheap 35M 39M -1 89.55% codeheap_'non-nmethods' 1M 2M 5M 20.53% metaspace 26M 27M -1 96.88% codeheap_'profiled_nmethods' 4M 4M 117M 3.57% compressed_class_space 2M 3M 1024M 0.29% codeheap_'non-profiled_nmethods' 685K 2496K 120032K 0.57% mapped 0K 0K - 0.00% direct 48M 48M - 100.00%
③jmap
-
Check the JAVA program process number
jps -l
-
View real-time memory usage
jhsdb jmap --heap --pid 20196
-
export snapshot file
jmap -dump:format=b,file=/home/tmp/my-dump.hprof 20196
-
Automatically export heap snapshots for memory overflow
-XX: + heapdumpOnOutOfMemoryError -XX:heapdumpPath==/home/tmp/my-dump.hprof
④heapdump
The Arthas heapdump command supports exporting heap snapshots:
https://arthas.aliyun.com/doc/heapdump.html
-
dump to specified file
heapdump /home/tmp/my-dump.hprof
-
dump live object to specified file
heapdump --live /home/tmp/my-dump.hprof
-
dump to a temporary file
heapdump
⑤Garbage collection
jstat can check the garbage collection status, and observe whether the program is frequently GC or whether the GC takes too long:
jstat -gcutil <pid> <interval(ms)>
-
Check garbage collection every second
$ jstat -gcutil 20196 1000 S0 S1 E O M CCS YGC YGCT FGC FGCT CGC CGCT GCT 0.00 0.00 57.69 0.00 - - 0 0.000 0 0.000 0 0.000 0.000 0.00 0.00 57.69 0.00 - - 0 0.000 0 0.000 0 0.000 0.000 0.00 0.00 57.69 0.00 - - 0 0.000 0 0.000 0 0.000 0.000
The parameters are described as follows:
S0: The ratio of the Survivor 0 area to the used space in the new generation
S1: The ratio of the Survivor 1 area to the used space in the new generation
E: The ratio of the new generation to the used space
O: The ratio of the old generation to the used space
P: The percentage of the permanent zone to the used space
YGC: the number of Young GC occurrences since the application was started
YGCT: The time (seconds) spent by Young GC since the application was started
FGC: The number of times Full GC has occurred since the application was started
FGCT: The time (seconds) spent by Full GC since the application was started
GCT: The total time of garbage collection used since the application was started (seconds)
3) Comprehensively find problems
①Stress test
System pressure testing can actively expose system problems and evaluate system capacity. The simple and commonly used parameters are as follows:
-
Common tools: JMeter
-
Step pressure: the number of threads 10, 20, 30 increases to the bottleneck
-
Duration: Lasts 1 minute, Ramp-Up=0
-
TPS: Throughput
-
Response time: focus on 95Line
②Monitoring system
The monitoring system can display relevant indicators in a more friendly manner. If the company has certain technical strength, it can develop by itself, otherwise it can choose to use the industry’s common solution.
Fourth, optimize performance issues
1. Four methods
-
reduce request
-
space for time
-
task parallelization
-
task asynchronization
2. Five levels
-
proxy layer
-
front-end layer
-
service layer
-
caching layer
-
data layer
3. Optimization instructions
When it comes to performance optimization, it is not difficult to think of solutions such as indexing and caching. This may be correct, but thinking this way may cause omissions, because this is only a solution for the cache layer and data layer.
If invalid traffic can be rejected at the outermost layer, then this is a better protection for the system. Four methods can be applied at each level, let us give some examples:
1) Reduce requests + front-end layer
Set the pre-verification code in the seckill scenario
2) Reduce request + service layer
Can multiple RPCs be converted to one batch RPC
3) Space for time + service layer
Introduce caching
4) Space for time + cache layer
Introduce multi-level caching
5) Space for time + data layer
new index
6) Task parallelization + service layer
If multiple calls are independent of each other, use Future parallelization
7) Task asynchronization + service layer
If you don’t need to wait for the return result, you can execute it asynchronously
V. Article Summary
First, this article discusses how to view performance problems in the early, middle, and later stages of the system. Second, it discusses what performance is. Third, it discusses how to optimize performance problems. I hope this article will be helpful to everyone.
Author丨IT Fatty Xu
Source丨Public account: JAVA Frontline (ID: www_xpz)