Analysis of Java memory model JMM and three major features from the operating system level

1. Parallelism and concurrency

The purpose of both is to maximize CPU usage.

Parallel: Refers to multiple instructions being executed simultaneously on multiple processors at the same time. So whether from a micro or macro perspective, the two are executed together.

Concurrency: It means that only one instruction can be executed at the same time, but multiple process instructions are executed in rapid rotation, which has the effect of multiple processes executing at the same time at the macro level, but at the micro level The above are not executed at the same time, but the time is divided into several segments so that multiple processes can execute quickly and alternately.

Parallelism exists in multi-processor systems, while concurrency can exist in both single-processor and multi-processor systems. Concurrency can exist in single-processor systems because concurrency is an illusion of parallelism. Parallelism requires that the program can execute multiple processes at the same time. Operations, while concurrency just requires the program to pretend to perform multiple operations at the same time (performing one operation per small time slice, multiple operations quickly switch execution)

Another example: Concurrency is like each of us having a mobile phone that is out of battery. If there is only one power bank at this time, then only one mobile phone can be charged at the same time, and others will have to wait, and everyone will take turns to charge. Parallel means that everyone brings a power bank, and everyone uses their own power bank to charge, and no one disturbs anyone. This is the difference between concurrency and parallelism.

2. Three major characteristics of concurrency

The three major characteristics of concurrent programming are visibility, ordering, and atomicity.

2.1 Visibility

When one thread modifies the value of a shared variable, other threads can see the modified value. The Java memory model achieves visibility by synchronizing the new value back to the main memory after the variable is modified, and refreshing the variable value from the main memory before the variable is read. This method relies on main memory as the delivery medium to achieve visibility.

Ways to ensure visibility:

Ensure visibility through the volatile keyword.
Ensure visibility through memory barriers.
Ensure visibility through the synchronized keyword.
Ensure visibility through Lock.
Ensure visibility through the final keyword.

The following code demonstrates visibility issues in concurrency and ways to ensure visibility.

public class VisibilityTest {

    private boolean flag = true;
    private int count = 0;

    public void refresh() {
        flag = false;
        System.out.println(Thread.currentThread().getName() + "Modify flag:" + flag);
    }

    public void load() {
        System.out.println(Thread.currentThread().getName() + "Start execution....");
        while (flag) {
            count + + ;
        }
        System.out.println(Thread.currentThread().getName() + "Out of the loop: count=" + count);
    }

    public static void main(String[] args) throws InterruptedException {
        VisibilityTest test = new VisibilityTest();

        // Thread threadA simulates data loading scenario
        Thread threadA = new Thread(() -> test.load(), "threadA");
        threadA.start();

        // Let threadA execute for a while
        Thread.sleep(1000);
        // Thread threadB controls the execution time of threadA through flag
        Thread threadB = new Thread(() -> test.refresh(), "threadB");
        threadB.start();

    }

}

operation result:

Here we have a shared variable flag. Thread A performs business processing by judging the value of flag. Thread B changes the value of flag from true to false. If thread B’s operation on the shared variable flag is visible to thread A, then thread A will Exiting the loop, but judging from the results, it does not exit the loop. This is a visibility issue involving one of the three major features in concurrency.

The following code is a few ways to solve visibility

public class VisibilityTest {
    //Method 1 Add volatile keyword
    private volatile boolean flag = true;
    //Method 2: Add the volatile keyword to the shared variable after the judgment flag
    private int count = 0;

    //Method 8
    //private Integer count = 0;

    public void refresh() {
        flag = false;
        System.out.println(Thread.currentThread().getName() + "Modify flag:" + flag);
    }

    public void load() {
        System.out.println(Thread.currentThread().getName() + "Start execution....");
        while (flag) {
            //TODO business logic
            count + + ;

            //Method 3: Pass the memory barrier
            //UnsafeFactory.getUnsafe().storeFence();
            //Method 4: Release time slice, context switch, reload context: flag=true
            //Thread.yield();
            //Method 5: Use the synchronized keyword (the bottom layer still uses memory barriers)
            //System.out.println(count);
            //Method 6: The bottom layer still uses memory barriers
            //LockSupport.unpark(Thread.currentThread());
            //Method 7: The bottom layer still uses memory barriers
// try {
// Thread.sleep(1);
// } catch (InterruptedException e) {
// e.printStackTrace();
// }

            

        }
        System.out.println(Thread.currentThread().getName() + "Out of the loop: count=" + count);
    }

    public static void main(String[] args) throws InterruptedException {
        VisibilityTest test = new VisibilityTest();

        // Thread threadA simulates data loading scenario
        Thread threadA = new Thread(() -> test.load(), "threadA");
        threadA.start();

        // Let threadA execute for a while
        Thread.sleep(1000);
        // Thread threadB controls the execution time of threadA through flag
        Thread threadB = new Thread(() -> test.refresh(), "threadB");
        threadB.start();

    }
public class UnsafeFactory {

    /**
     * Get the Unsafe object
     * @return
     */
    public static Unsafe getUnsafe() {
        try {
            Field field = Unsafe.class.getDeclaredField("theUnsafe");
            field.setAccessible(true);
            return (Unsafe) field.get(null);
        } catch (Exception e) {
            e.printStackTrace();
        }
        return null;
    }

    /**
     * Get the memory offset of the field
     * @param unsafe
     * @paramclazz
     * @param fieldName
     * @return
     */
    public static long getFieldOffset(Unsafe unsafe, Class clazz, String fieldName) {
        try {
            return unsafe.objectFieldOffset(clazz.getDeclaredField(fieldName));
        } catch (NoSuchFieldException e) {
            throw new Error(e);
        }
    }


}

To address the above code visibility issues, we can look at the following illustration:

1. First, thread A obtains the value of flag from the main memory through the read instruction, which is true.

2. Generate a copy of the flag variable in local memory through the load instruction

3. Read the flag value into the cpu register through the use command

Thread B executes after 4.1s

5. Thread B obtains the value of flag from the main memory through the read instruction, which is true.

6. Generate a copy of the flag variable in local memory through the load instruction

7. Read the value of flag into the cpu register through the use command. At this time, call the method to change the value of flag to flase.

8. Thread B writes back to local memory through the assign command

9. Then flash back to the main memory through the store command and write, and the flag value in the main memory is changed to flase.

9. But at this time, thread A keeps getting the flag variable value in the local memory (related to optimization, the while loop is too many times too fast, and the interval for getting the flag variable is short, it will keep getting the value directly from the local memory, and The value of the variable will not be refreshed from the main memory), and there is no way to know that the value of the flag in the main memory has changed, causing thread A to keep looping. For more information about these instructions, please view the JMM memory model below.

The problem can be extended here

1. When will the local memory become invalid (the value will be retrieved from the main memory)

If a shared variable is not used for a period of time, the local memory will eliminate the shared variable copy and obtain it from the main memory again.

2. When will the variables in local memory be flushed back to main memory?

The memory will definitely be flushed back before the thread ends.

We can summarize that there are two main ways to solve the visibility problem at the bottom of Java. The bottom layer will be explained later in the volatile bottom layer implementation and other ways of bottom layer implementation.

1. Pass storeLoad memory barrier

2. Re-read the value from main memory through context switching.

2.2 Orderliness

That is, the order of program execution is based on the order of code, but the JVM has instruction rearrangement, so there is an ordering problem.

How to ensure orderliness:

Ensure orderliness through the volatile keyword.
Ensure orderliness through memory barriers.
Ensure orderliness through the synchronized keyword.
Ensure orderliness through Lock. Let’s first look at the most common DCL code for creating singleton objects.

public class SingletonFactory {

    private volatile static SingletonFactory myInstance;

    public static SingletonFactory getMyInstance() {
        if (myInstance == null) {
            synchronized (SingletonFactory.class) {
                if (myInstance == null) {
                    myInstance = new SingletonFactory();
                }
            }
        }
        return myInstance;
    }

    public static void main(String[] args) {
        SingletonFactory.getMyInstance();
    }
}

Think about a question: Why does DCL use volatile?

This is because myInstance = new SingletonFactory(), this step is not an atomic operation. When we create a new object, the bottom layer of the operating system will be divided into three steps:

1. Create space

2. Object initialization

3.myInstance points to the address of the memory space

However, due to the existence of the instruction rearrangement mechanism, the second and third steps may be in reverse order, which means that in a multi-threaded situation, the object created by the first thread has not been initialized, and the second thread will judge This object is not empty, and it is undoubtedly problematic to use this semi-finished object directly.

2.3 Atomicity

One or more operations are either fully executed without being interrupted by any factors during execution, or are not executed at all. In Java, reading and assigning operations to variables of basic data types are atomic operations (64-bit processors). An increment operation without any atomicity guarantees is not atomic.

How to ensure atomicity:

Atomicity is guaranteed through the synchronized keyword.
Atomicity is guaranteed through Lock.
Atomicity is guaranteed through CAS.

Enable ten threads to operate on sum

public class Test {

    private volatile static int sum = 0;
    public static void main(String[] args) {

        for (int i = 0; i < 10; i + + ) {
            Thread thread = new Thread(()->{
                for (int j = 0; j < 10000; j + + ) {
                    sum + + ;
                }
            });
            thread.start();
        }

        try {
            Thread.sleep(3000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        System.out.println(sum);

    }

}

operation result:

If the auto-increment is an atomic operation, the result should be 10000. The result obtained by multiple experiments is not 10000. The visible volatile keyword cannot guarantee our atomicity.

3. Java Memory Model (JMM)

3.1 JMM definition

The Java Memory Model (JMM) is defined in the Java virtual machine specification, which is used to shield the memory access differences of various hardware and operating systems, so that Java programs can achieve consistent concurrency effects on various platforms. , JMM specifies how the Java virtual machine and computer memory work together: stipulating how and when a thread can see the value of a shared variable modified by other threads, and how to access shared variables synchronously when necessary. JMM describes an abstract concept, a set of rules, through which the access methods of various variables in the program in the shared data area and private data area are controlled. JMM is centered around atomicity, ordering, and visibility.

3.2 The relationship between JMM and hardware memory architecture

There is a difference between Java memory model and hardware memory architecture. The hardware memory architecture does not distinguish between thread stacks and heaps. For hardware, all thread stacks and heaps are distributed in main memory. Parts of the thread stack and heap may sometimes appear in the CPU cache and in the CPU’s internal registers. As shown in the figure below, the Java memory model and computer hardware memory architecture have a cross-relationship:

Memory Interaction:

Regarding the specific interaction protocol between main memory and working memory, that is, the implementation details of how a variable is copied from main memory to working memory and how to synchronize from working memory to main memory, the Java memory model defines the following eight operations to complete :

lock (lock): Acts on variables in main memory, marking a variable as exclusive to one thread.
unlock (unlock): Acts on main memory variables to release a variable that is in a locked state. Only the released variable can be locked by other threads.
read (read): Acts on the main memory variable, transferring a variable value from the main memory to the working memory of the thread for use in the subsequent load action
load (load): Acts on variables in the working memory. It puts the variable value obtained from the main memory by the read operation into a copy of the variable in the working memory.
use (use): Acts on variables in the working memory, passing a variable value in the working memory to the execution engine, whenever the virtual machine encounters a value that requires the use of a variable This operation will be performed when using bytecode instructions.
assign (assignment): Acts on a variable in the working memory. It assigns a value received from the execution engine to a variable in the working memory. Whenever the virtual machine encounters a given This operation is performed when assigning a variable to a bytecode instruction.
store (storage): Acts on variables in the working memory, transferring the value of a variable in the working memory to the main memory for subsequent write operations.
write (write): Acts on variables in main memory. It transfers the store operation from the value of a variable in the working memory to the variable in the main memory.

The Java memory model also stipulates that when performing the above eight basic operations, the following rules must be met:

If you want to copy a variable from main memory to working memory, you need to perform read and load operations in sequence. If you want to synchronize the variable from working memory back to main memory, you need to Execute store and write operations sequentially. However, the Java memory model only requires that the above operations must be executed in order, and there is no guarantee that they must be executed continuously.
One of the read and load, store and write operations is not allowed to appear alone.
A thread is not allowed to discard its most recent assign operation, that is, variables must be synchronized to main memory after they are changed in working memory.
A thread is not allowed to synchronize data from working memory back to main memory for no reason (no assign operation has occurred).
A new variable can only be born in the main memory, and an uninitialized (load or assign) variable is not allowed to be used directly in the working memory. That is, before performing use and store operations on a variable, the load and assign operations must be performed first.
A variable only allows one thread to perform a lock operation on it at the same time, but the lock operation can be executed multiple times by the same thread. After executing the lock multiple times, only execution The variable will be unlocked only after the same number of unlock operations. lock and unlock must appear in pairs.
If you perform a lock operation on a variable, the value of this variable in the working memory will be cleared. Before the execution engine uses this variable, you need to re-execute the load or assign operation to initialize the variable. value
If a variable has not been locked by a lock operation in advance, the unlock operation is not allowed; it is also not allowed to unlock a variable that is locked by other threads.
Before performing an unlock operation on a variable, the variable must first be synchronized to the main memory (perform store and write operations).

4.Volatile keyword

4.1 Volatile memory semantics

Characteristics of volatile:

Visibility: Reading a volatile variable can always see (any thread) the last write to the volatile variable.
Atomicity: Reading/writing any single volatile variable is atomic, but compound operations like volatile + + are not atomic (based on this, we It will be considered that volatile is not atomic). Volatile only guarantees that the read/write of a single volatile variable is atomic, while the mutually exclusive execution feature of the lock can ensure that the execution of the entire critical section code is atomic. For 64-bit long and double variables, as long as it is a volatile variable, reading/writing to the variable is atomic.
Orderliness: Various specific memory barriers are added before and after the read and write operations of volatile-modified variables to prohibit instruction reordering to ensure orderliness.

Volatile write-read memory semantics:

When writing a volatile variable, JMM will refresh the shared variable value in the local memory corresponding to the thread to the main memory.
When reading a volatile variable, JMM will invalidate the local memory corresponding to the thread, and the thread will next read the shared variable from the main memory.

4.2 Volatile implementation visibility principle

At the Jvm level: the read, load, use operations and assign, store, and write operations of volatile-modified variables must be continuous, that is, they must be synchronized back to the main memory immediately after modification, and they must be used from Main memory is refreshed, thereby ensuring the visibility of volatile variable operations to multiple threads.
At the hardware level: through the lock prefix instruction, the variable cache line area will be locked and written back to the main memory. This operation is called “cache locking”, and the cache consistency mechanism will prevent Modify the memory area data cached by more than two processors at the same time. Writing back the cache of one processor to the memory will cause the cache of other processors to be invalid.

Implementation of volatile in hotspot

Bytecode interpreter implementation: The bytecode interpreter (bytecodeInterpreter) in JVM implements JVM instructions in C++. Its advantage is that the implementation is relatively simple and easy to understand. The disadvantage is that Execution is slow. bytecodeInterpreter.cpp:

//Add memory barrier (jvm level)
inline void OrderAccess::storeload() { fence(); }

inline void OrderAccess::fence() {
   //Whether it is multi-core
  if (os::is_MP()) {
    // always use locked addl since mfence is sometimes expensive
#ifdef AMD64
    //lock prefix instruction (assembly level)
    __asm__ volatile ("lock; addl $0,0(%%rsp)" : : : "cc", "memory");
#else
    __asm__ volatile ("lock; addl $0,0(%%esp)" : : : "cc", "memory");
#endif
  }

Template interpreter implementation: It writes a corresponding assembly code for each instruction, and binds each instruction to the corresponding assembly code entry at startup, which can be said to be extremely efficient. . When we assign a value to a variable, the JVM executes the following code templateTable_x86_64.cpp

// Responsible for executing putfield or putstatic instructions
void TemplateTable::putfield_or_static(int byte_no, bool is_static, RewriteControl rc) {
    // ...
     // Check for volatile store
    __ testl(rdx, rdx);
    __ jcc(Assembler::zero, notVolatile);

    putfield_or_static_helper(byte_no, is_static, rc, obj, off, flags);
    volatile_barrier(Assembler::Membar_mask_bits(Assembler::StoreLoad |
                                                 Assembler::StoreStore));
    __jmp(Done);
    __ bind(notVolatile);

    putfield_or_static_helper(byte_no, is_static, rc, obj, off, flags);

    __bind(Done);
 }

//memory barrier
void TemplateTable::volatile_barrier(Assembler::Membar_mask_bits
                                     order_constraint) {
  // Helper function to insert a is-volatile test and memory barrier
  if (os::is_MP()) { // Not needed on single CPU
    __membar(order_constraint);
  }
}

assembler_x86.hpp

// Serializes memory and blows flags
  void membar(Membar_mask_bits order_constraint) {
    // We only have to handle StoreLoad
    // The x86 platform only needs to handle StoreLoad
    if (order_constraint & amp; StoreLoad) {

      int offset = -VM_Version::L1_line_size();
      if (offset < -128) {
        offset = -128;
      }

      // The following two sentences insert a lock prefix instruction: lock addl $0, $0(%rsp)
      lock(); // lock prefix instruction
      addl(Address(rsp, offset), 0); // addl $0, $0(%rsp)
    }
  }

The role of the lock prefix command

Ensure the atomicity of subsequent instruction execution. In Pentium and previous processors, instructions with a lock prefix will lock the bus during execution, making other processors temporarily unable to access memory through the bus. Obviously, this overhead is high. In new processors, Intel uses cache locking to ensure the atomicity of instruction execution. Cache locking will greatly reduce the execution overhead of lock prefix instructions.
The LOCK prefix instruction has a function similar to a memory barrier, prohibiting the instruction from being reordered with previous and subsequent read and write instructions.
The LOCK prefix instruction will wait for all instructions before it to complete and all buffered write operations to be written back to the memory (that is, the contents of the store buffer will be written to the memory ), and according to the cache consistency protocol, refreshing the store buffer will cause copies in other caches to become invalid.