Performance optimization: JIT just-in-time compilation and AOT ahead-of-time compilation

High-quality blog posts: IT-BLOG-CN

1. Introduction

The difference between JIT and AOT: Two different compilation methods. The main difference lies in whether to compile at runtime .

JIT: Just-in-time Dynamic (just-in-time) compilation, compilation while running: When the program is running, the hot code is calculated according to the algorithm, and then JIT real-time compilation is performed. This method has high throughput, has a runtime performance bonus, can run faster, and can dynamically generate code, etc., but is relatively slower to start and requires a certain amount of time and call frequency. Only then can the layering mechanism of JIT be triggered. The disadvantage of JIT is that compilation requires runtime resources, which can cause process lags.

AOT: Ahead Of Time refers to pre-run compilation, pre-compilation: AOT compilation can directly convert source code into machine code, with low memory usage and fast startup speed< /strong>. There is no need for runtime to run, and runtime is statically linked directly into the final program. However, there is no runtime performance bonus, and further optimization cannot be done based on the running conditions of the program< /strong>, AOTThe disadvantage is that compiling before running the program will increase the time of program installation.

JIT Just-in-time compilation refers to the process of converting bytecode into machine code that can be run directly on the hardware during the running process of the program and deploying it to the hosting environment. AOT compilation refers to the process of converting bytecode into machine code before the program is run.

.java -> .class -> (using jaotc compilation tool) -> .so (program function library, that is, compiled code and data that can be used by other programs)

2. JIT

In the HotSpot virtual machine, there are two types of JIT built in, namely the C1 compiler and the C2 compiler. The compilation process of these two compilers is different.

【1】C1 compiler: C1 compiler is a simple and fast compiler. The main focus is on local optimization. It is suitable for For programs that have short execution times or require startup performance, it is also called Client Compiler.
【2】C2 compiler: C2 compiler is a compiler that performs performance tuning for long-running server-side applications and is suitable for Programs that take a long time to execute or require peak performance are also called Server Compiler. For example, long-running Java applications on the server have certain requirements for stable operation. . JDK 6 begins to define server-level machines as having at least two CPU and 2GB of physical memory before enabling C2;
【3】Layered compilation: In Java8, layered compilation is enabled by default. Before 1.8, layered compilation was turned off by default. Before Java7, you needed to select the corresponding JIT based on the characteristics of the program. By default, the virtual machine used an interpreter to work with one of the compilers.

Layered compilation divides the execution status of JVM into 5 levels:
[1] Layer 0: Program interpretation and execution, the performance monitoring function Profiling is enabled by default. If it is not enabled, the second layer of compilation can be triggered;
【2】Level 1: It can be called C1 compilation, which compiles bytecode into local code for simple and reliable optimization without turning on Profiling;
【3】Level 2: Also known as C1 compilation, turn on Profiling, and only perform the number of method calls and the number of loopback executions profiling Compiled with C1;
【4】Level 3: Also called C1 compilation, performs all C1 compilation with Profiling;
[5] Layer 4: It can be called C2 compilation, which also compiles bytecode into local code, but will enable some optimizations that take a long time to compile, and even perform some optimizations based on performance monitoring information. Unreliable aggressive optimization.

mixed mode represents the default mixed compilation mode. In addition to this mode, we can also use the -Xint parameter to force the virtual machine to run in the interpreter-only compilation mode. At this time, JIT does not intervene at all; you can also use the parameter -Xcomp to force the virtual machine to run in the compilation mode of only JIT. as follows:

If you only want to enable C2, you can turn off tiered compilation -XX:-TieredCompilation. If you only want to use C1, you can turn on tiered compilation. At the same time, use the parameters: -XX:TieredStopAtLevel=1

C1, C2 and C1 + C2 correspond to client, server and layered compilation respectively. C1 has a fast compilation speed and a conservative optimization method; C2 has a slow compilation speed and a more aggressive optimization method. C1 + C2 is compiled with C1 at the beginning, and is recompiled with G2 when the code reaches a certain temperature.

The parameter -XX:ReservedCodeCacheSize = N (where N is the default value provided for a specific compiler) mainly sets the size of the hot code cache codecache. If the cache is not enough, JIT will not be able to continue compilation and will be optimized. For example, compilation execution will be changed to interpretation execution. As a result, performance will be reduced. At the same time, you can check the usage of codecache through java -XX: + PrintCodeCache:

C:/Users/Administrator> java -XX: + PrintCodeCache
CodeCache: size=245760Kb used=1165Kb max_used=1165Kb free=244594Kb
 bounds [0x000000010be1b000, 0x000000010c08b000, 0x00000011ae1b000]
 total_blobs=293 nmethods=48 adapters=159
 compilation: enable

Related parameters:

Parameters Default value Description
-XX:InitialCodeCacheSize 2555904 (240M) Default CodeCache area size in bytes
-XX:ReservedCodeCacheSize 251658240 (240M) The maximum value of the CodeCache area, in bytes
-XX:CodeCacheExpansionSize 65536 (64K) CodeCacheThe size of each expansion, in bytes
-XX:ExitOnFullCodeCache false Whether to exit JVM< when the CodeCache area is full /code>
-XX:UseCodeCacheFlushing false Whether to clear before turning off JIT compilation CodeCache
-XX:MinCodeCacheFlushingInterval 30 Refresh CodeCache The minimum time interval, in seconds
-XX:CodeCacheMinimumFreeSpace 512000 When CodeCache Stop JIT compilation when the remaining space in the area is less than the value specified by the parameter. The remaining space will no longer be used to store the local code of the method, but can store the local method adapter code
-XX:CompileThreshold 10000 Specifies the number of times a method is called before being compiled by JIT
-XX:OnStackReplacePercentage 140 This value is the threshold used to calculate whether to trigger OSR (OnStackReplace) compilation

How to determine hot code

[1] Sampling-based hotspot detection: Mainly the virtual machine periodically checks the top of the stack of each thread. If a certain method or methods often appear on the top of the stack, then this method is a “hotspot method” “. The advantage is that it is simple to implement. The disadvantage is that it is difficult to accurately determine the popularity of a method, and it is easily affected by thread blocking or external factors.
[2] Counter-based hotspot detection (Typical application-Hotspot): Mainly, the virtual machine establishes a counter for each method or even code block to count the number of execution times of the method. If it exceeds a certain threshold, Mark this method as a hotspot method. Counter-based hotspot detection method used by Hotspot. Then two types of counters are used: method call counters and edge counters. When the sum of the method counter and the edge counter exceeds the method counter threshold, the JIT compiler is triggered.
[3] Method call counter: The method call counter is used to count the number of times a method is called. The default threshold is 1500 times in C1 mode. In C2 mode, it is 10000 times, which can be set by -XX: CompileThreshold; in the case of layered compilation The threshold specified by -XX: CompileThreshold will become invalid and will be dynamically adjusted based on the current number of methods to be compiled and the number of compilation threads.
[4] Edge-back counter: The edge-back counter is used to count the number of times the loop body code is executed in a method. Instructions that jump back when the control flow is encountered in the bytecode are called “edge-back” Back Edge, this value is used to calculate the threshold for triggering C1 compilation. When layered compilation is not enabled, C1 defaults to 13995, C2 defaults to 10700, which can be set by -XX: OnStackReplacePercentage=N; while in layered compilation In the case of -XX: OnStackReplacePercentage, the threshold specified will also be invalid. At this time, it will be dynamically adjusted according to the current number of methods to be compiled and the number of compilation threads.

Edge counter threshold calculation rules:
1. In C1 mode: CompileThreshold*OnStackReplacePercentage/100; that is: method call counter threshold *OSR ratio/100;
2. In C2 mode: (CompileThreshold)*(OnStackReplacePercentage-InterpreterProfilePercentage)/100; that is: method call counter threshold * (OSR ratio – Interpreter monitoring ratio)/100;

JIT optimization

JIT compilation uses some classic compilation optimization techniques to achieve code optimization: there are two main types: Method inlining and escape analysis.

[1] Method inlining: The optimization behavior of method inlining is to copy the code of the target method into the calling method to avoid actual method calls. Method inlining can not only eliminate the performance overhead caused by the call itself, but can also further trigger more optimizations.

 private int add1(int s1, int s2, int s3, int s4) {<!-- -->
        return add2(s1 + s2) + add2(s3 + s4);
    }

    private int add2(int s1, int s2) {<!-- -->
        return s1 + s2;
    }

    //Code after method inline
    private int add(int s1, int s2, int s3, int s4) {<!-- -->
        return s1 + s2 + s3 + s4;
    }

Strategies to improve method inlining: adjust parameters; write small methods; use static, final keywords, no method inheritance, no additional type checking, just Inlining may occur;

[2] Lock elimination: If it is in a single-threaded environment, JIT compilation will eliminate the method lock of this object, jdk1.8On by default. For example:

 //-XX:-EliminateLocks First close the lock to eliminate it, then open it. Execute this code 1 million times to view the difference.
    public static String getString(String s1, String s2) {<!-- -->
        StringBuffer sb = new StringBuffer();
        sb.append(s1);
        sb.append(s2);
        return sb.toString();
    }

[3] Scalar replacement: Escape analysis proves that an object will not be accessed externally. If the object can be split, the object may not be created when the program is actually executed, and its object can be created directly. member variables instead, the prerequisite is to enable escape analysis (jdk1.8 enables escape analysis by default -XX: + DoEscapeAnalysis; -XX: + EliminateAllocations Enable scalar replacement jdk1.8 is enabled by default).

public void foo() {<!-- -->
    Person info = new Person ();
    info.name = "queen";
    info.age= 18;
}

//After escape analysis, the code will be optimized (scalar replacement) as:
public void foo() {<!-- -->
    String name= "queen";
    int age= 18;
}

3. AOT

Advantages: Java virtual machine loading has been pre-compiled into a binary library and can be executed directly. There is no need to wait for the compiler to warm up in time, which reduces the bad experience of “slow first run” brought by Java applications. Compiling before the program is run can avoid compilation performance consumption and memory consumption during runtime. The highest performance can be achieved at the early stage of program running, and the program startup speed is fast. The running product only has machine code and the packaging size is small.

Disadvantages: Because it is static ahead-of-time compilation, machine instruction sequences cannot be selected based on hardware conditions or program running conditions. Theoretical peak performance is not as good as JIT, without dynamic capabilities, the same product cannot run across platforms. The first just-in-time compilation JIT is the default mode, which is used by the Java Hotspot virtual machine to convert bytecode into machine code at runtime. The latter ahead-of-time compilation AOT is supported by the novel GraalVM compiler and allows static compilation of bytecode directly to machine code at build time.

Now we are in the era of cloud native, cost reduction and efficiency improvement. Compared with other programming languages such as Go and Rust, Java has a very big disadvantage. The startup compilation and startup process are very slow, which conflicts with the cloud native technology that elastically expands and shrinks based on real-time computing resources. Spring6 uses AOT technology to occupy low memory at runtime. , with fast startup speed, it will gradually meet the needs of Java in the cloud native era. Commercial companies that use Java applications on a large scale can consider investigating and using JDK17< as soon as possible. /code>, using cloud native technology to reduce costs and increase efficiency for the company.

Graalvm

The AOT technology supported by Spring6 is supported by GraalVM at the bottom level. Spring also supports GraalVM Native images provide first-class support. GraalVM is a high-performance JDK designed to accelerate applications written in Java and other JVM languages execution, while also providing runtimes for JavaScript, Python, and many other popular languages. GraalVM provides two ways to run Java applications: using Graal just-in-time JIT on HotSpot JVM compiler or as a native executable compiled ahead of time AOT. GraalVM's multilingual capabilities make it possible to mix multiple programming languages in a single application while eliminating the cost of external calls. GraalVM adds an advanced just-in-time JIT optimizing compiler written in Java to the HotSpot Java virtual machine.

GraalVM has the following features:
【1】An advanced optimizing compiler that generates faster, leaner code that requires fewer computing resources;
【2】AOT native image compilation compiles Java applications into native binary files in advance, starts immediately, and achieves maximum performance without preheating;
【3】Polyglot programming takes advantage of the best features and libraries of popular languages in a single application with no additional overhead;
【4】Advanced tools to debug, monitor, analyze and optimize resource consumption in Java and multiple languages;

Native Image

At present, in the industry, in addition to this solution of performing AOT in JVM, there is another way to implement Java AOT, which is to directly abandon code>JVM, like C/C++, directly compiles the code into machine code through the compiler and then runs it. This is undoubtedly an idea that directly subverts the Java language design, that is GraalVM Native Image. It implements an ultra-miniature runtime component Substrate VM through the C language, which basically implements various features of JVM, but is lightweight enough , can be easily embedded, which allows the Java language and engineering to get rid of the limitations of JVM and can truly implement the same as C/C++ Same AOT compilation. After a long period of optimization and accumulation, this solution has achieved very good results and has basically become the official Java AOT solution recommended by Oracle.
Native Image is an innovative technology that compiles Java code into a stand-alone native executable file or a native shared library. The Java bytecode processed during building a native executable includes all application classes, dependencies, third-party dependent libraries, and any required JDK classes. The resulting self-contained native executable is specific to each individual operating system and machine architecture that does not require a JVM.

Native Image building process

Download GraalVM and configure environment variables: change JAVA_HOME to the location of graalvm, and change Path to the location of graalvm bin location.

Variable name: JAVA_HOME
Variable value: D:\graalvm-ce-java17-22.3.0

Check if the installation is successful

C:/Users/Administrator>java -version
openjdk version "17.0.5" 2023-10-32
OpenJDK Runtime Environment GraalVM CE 22.3.0 (build 17.0.5 + 8-jvmci-22.3-b08)
OpenJDK 64-Bit Server VM GraalVM CE 22.3.0 (build 17.0.5 + 8-jvmci-22.3-b08, mixed mode, sharing)

Install the native-image plug-in through gu install native-image and check the version through gu list

C:/Users/Administrator>gu install native-image
...
C:/Users/Administrator>gu list
...
native-image 22.3.0 Native Image Early adopter

Comparison: Build files through javac xx.java and native-image xx, which include SVM and JDKThe size of various libraries is larger than the binary file of C/C++, but compared with the complete JVM , it can be said that it is already very small.

Compared with running using JVM, Native Image is much faster and the cpu usage is lower. From the official It can also be seen from the quasi-experimental data that Native Image has a very significant improvement in startup speed and memory usage:

syntaxbug.com © 2021 All Rights Reserved.