Java high concurrency programming – the principle of visibility and order

Atomicity, visibility, and order are the three major problems faced by concurrent programming. Java has solved the atomicity problem in concurrent programming through the CAS operation. This chapter introduces how Java solves the remaining two problems-visibility and order.

CPU physical cache structure

Since the computing speed of the CPU is much faster than the access speed of the main memory (physical memory), in order to improve the processing speed, the modern CPU does not directly communicate with the main memory, but a multi-layer high-speed cache is designed between the CPU and the main memory. (cache), the closer to the CPU the faster the cache, the smaller the capacity.

The data stored in each level of cache is part of the next level of cache, and the closer to the CPU, the faster the cache and the smaller the capacity. Both L1 cache and L2 cache can only be used by a single CPU core, L3 cache can be shared by CPU cores on the same socket, and main memory is shared by all CPU cores on all sockets. When the CPU reads data, it first reads from the L1 cache. If there is no hit, it reads from the L2 and L3 caches. If none of these caches hits, it reads the required data from the main memory data.

Solve cache consistency

Cache consistency: When multiple processor computing tasks involve the same main memory area, the respective cache data may be different. MESI (Modified Exclusive Shared Or Invalid) is a widely used cache coherence protocol that supports write-back strategy. Each cache line (caceh line) in the CPU is marked with 4 states (using additional The two bits indicate): Modified is modified; Exclusive is exclusive; Shared is shared; Invalid is invalid;

Three major problems in concurrent programming

Atomicity problem

An atomic operation is “one or a series of operations that cannot be interrupted”, and refers to an operation that will not be interrupted by the thread scheduling mechanism. Once this operation starts, it will run until the end without any thread switching in the middle. The i ++ operation mentioned earlier is not an atomic operation, because the bottom layer of the operating system consists of four atomic operations:

Copy the value of i from main memory and into the working memory of the CPU.
The CPU reads the value in the working memory, then performs the i++ operation, and flushes it to the working memory after completion.
Update the value in working memory to main memory.

Visibility issues

When one thread modifies a shared variable, another thread can immediately see it. We call the shared variable memory visibility. JMM (Java Memory Model, Java Memory Model) stipulates that all variables are stored in the public main memory. When a thread uses a variable, it will copy the variable in the main memory to its own workspace (or private memory). The thread’s read and write operations on variables are copies of the variables in its own working memory. The two main points of the visibility problem are: variable sharing and multi-threading! The root cause of the invisible problem is that due to the existence of the cache, the thread holds a copy of the shared variable and cannot perceive the changes made by other threads to the shared variable, resulting in the read value not being the latest. However, final modified variables are immutable, even if there is a cache, there will be no problem of invisibility.

The modification of the run variable by the main thread is invisible to the t thread, causing the t thread to fail to stop:

static boolean run = true; //add volatile
public static void main(String[] args) throws InterruptedException {<!-- -->
    Thread t = new Thread(()->{<!-- -->
        while(run){<!-- -->
        // ....
        }
});
    t. start();
    sleep(1);
    run = false; // thread t will not stop as expected
}

reason:

In the initial state, the t thread has just read the value of run from the main memory to the working memory
Because the t thread frequently reads the value of run from the main memory, the JIT compiler will cache the value of run to the cache in its own working memory, reducing access to the run in the main memory and improving efficiency
After 1 second, the main thread modifies the value of run and synchronizes it to the main memory, while t reads the value of this variable from the cache in its own working memory, and the result is always the old value

**Memory shielding mechanism guarantees visibility: **For volatile reads, the instruction itself is load. If visibility is required, as long as the following ordinary reads and ordinary writes are not rearranged to the front. For volatile writes, the instruction itself is a store, and to ensure its visibility, it requires ordinary writes in front of it and ordinary reads in the back, and cannot be rearranged with itself.

Ordering problem

The so-called order of the program means that the order in which the program is executed is executed in the order of the code. An ordering problem occurs if the order in which the program is executed differs from the order in which the codes are executed and results in incorrect results. In order to improve the running efficiency of the program, the CPU may optimize the input code. It does not guarantee that the execution sequence of each statement in the program is consistent with the sequence in the code, but it will ensure that the final execution result of the program is consistent with the sequence execution result of the code. . Two instructions can be out of order as long as there is no “data dependency” between them. Instruction rearrangement needs to ensure the As-if-Serial rule. The specific content of the As-if-Serial rule is: no matter how the order is reordered, it must be ensured that the code runs correctly under a single thread.

**Why order rearrangement? **Because some instructions are very time-consuming, under the premise of ensuring the correct operation under single-threaded, the CPU will first point to the following instructions and then turn back to point to the time-consuming instructions, provided that there is no instruction dependency. In addition, using the pipeline mechanism requires instruction rearrangement.

The principle of volatile

As mentioned above, in order to solve the short board of read and write performance when the CPU accesses the main memory, a cache is added to the CPU, but this brings visibility problems. The volatile keyword of Java can ensure the visibility of the main memory of the shared variable, that is, the changed value of the shared variable is immediately refreshed back to the main memory. Under normal circumstances, the system operation does not verify the cache consistency of shared variables. Only when the shared variable is modified by the volatile keyword, the cache line where the variable is located is required to perform cache consistency verification. characteristic:

Guaranteed visibility
atomicity not guaranteed
Guaranteed order (avoids instruction rearrangement)

Performance: The read operation of volatile modified variables is almost the same as that of ordinary variables, but the write operation is relatively slower, because many memory barriers need to be inserted in the local code to ensure that instructions will not be executed out of order, but the overhead is smaller than that of locks.

Synchronized cannot prohibit instruction rearrangement and processor optimization, why can order visibility be guaranteed?

After the lock is added, only one thread can obtain the lock, and the thread that cannot obtain the lock will be blocked, so only one thread executes at the same time, which is equivalent to a single thread. Due to the existence of data dependencies, the instructions of a single thread are rearranged There is no problem
Before the thread is locked, the value of the shared variable in clearing the working memory will be cleared. When using the shared variable, the latest value must be read from the main memory again; before the thread is unlocked, the latest value of the shared variable must be < strong>Refresh to main memory (Talked in the JMM memory interaction chapter)

Order and memory barrier

A memory barrier is a series of CPU instructions. Its main function is to ensure the execution order of specific operations and ensure the orderliness of concurrent execution. When both the compiler and the CPU perform instruction rearrangement optimization, a memory barrier instruction can be inserted between the instructions to tell the compiler and the CPU to prohibit instruction reordering before (or after) the memory barrier instruction.

Write barrier: Insert a write barrier instruction after the instruction to update the latest data in the register and cache to the main memory so that other threads can see it. And cannot rearrange the previous instructions. Synchronize the shared value before the write barrier to the main memory, and the previous instructions cannot be rearranged.
Read barrier: Insert a read barrier before the instruction to invalidate all the data in the cache and force the latest data to be reloaded from the main memory. And cannot rearrange subsequent instructions. Load the shared value after the read barrier with the latest data from the main memory, and the subsequent instructions cannot be rearranged
Full shielding: It is an all-round barrier with the ability to read barriers and write barriers.

**Memory shielding mechanism ensures visibility and order: **For volatile reading, the instruction itself is load. If visibility needs to be guaranteed, as long as the following ordinary reading and ordinary writing are not rearranged to the front. . For volatile writes, the instruction itself is a store, and to ensure its visibility, it requires ordinary writes in front of it and ordinary reads in the back, and cannot be rearranged with itself.

JMM (Java Memory Model, Java Memory Model)

JMM provides a reasonable way to disable caching and prohibit reordering, so its core value lies in solving visibility and ordering. It is an abstract concept, which does not actually exist, and is physically stored in the memory stick.

Main memory: Java instance objects are mainly stored, and instance objects created by all threads are stored in main memory, regardless of whether the instance object is a member variable or a local variable (also called local variable) in a method, and of course shared Class information, constants, static variables. Since it is a shared data area, multiple threads may find thread safety issues when accessing the same variable. Main memory corresponds directly to physical hardware memory
Working memory: mainly stores all local variable information of the current method (the working memory stores the variable copy in the main memory), each thread can only access its own working memory, that is, the local variables in the thread are invisible to other threads , even if two threads execute the same piece of code, they will each create local variables belonging to the current thread in their own working memory. Note that since the working memory is the private data of each thread, threads cannot access the working memory each other, so the data stored in the working memory does not have thread safety issues. Working memory corresponds to registers and caches

What JMM does:

Shield the memory access differences of various hardware and operating systems, so that Java programs can achieve consistent memory access effects on various platforms
Specifies some relationships between threads and memory

The provisions of the Java memory model are as follows: 1) All variables are stored in main memory. 2) Each thread has its own working memory, and operations on variables are performed in the working memory. 3) Different threads cannot directly access variables in each other’s working memory, and access can only be passed through the main memory.

JMM stores all variables in the public main memory. When a thread uses a variable, it copies the variables in the public main memory to its own working memory (or private memory). The thread’s read and write operations on variables are A copy of the variable in its own working memory. Therefore, the JMM model also needs to address code reordering and cache visibility issues. JMM provides its own solution to disable caching and prohibit reordering to solve these visibility and ordering problems. The solutions provided by JMM include volatile, synchronized, final, etc. that everyone is familiar with. The 8 operations of the interaction protocol between JMM main memory and working memory are as follows:

How does JMM solve orderly problems

The JMM provides its own memory barrier instructions that the JVM compiler is required to implement, prohibiting certain types of compiler and processor reordering.

Introduction to the Happens-Before rule

Program sequence execution rules (as-if-serial rules): In the same thread, operations with dependencies are in sequence, and the previous operation must occur before the subsequent operation.
A write operation to a volatile (decorated) variable must occur before a read operation to a volatile variable.
Transitivity rule: If operation A happens before operation B, and operation B happens before operation C, then operation A happens before operation C.
Monitor lock rule: The unlock operation occurs first before the subsequent lock operation on the monitor lock.
Join rule: If thread A executes the B.join() operation and returns successfully, then any operation in thread B happens before the ThreadB.join() operation executed by thread A.