Java Concurrency 03 – Order, Visibility, Atomicity.

First of all, I still want to tell everyone to continue to talk a lot, get out of the comfort zone, and work hard to persevere. We just need to surpass 80% of people. As Teacher Chen Hao said, if you just look at China’s Internet, you will find that they are basically consuming the public and making the public more stupid and stupid. So, in today’s China, you basically don’t have to do anything, just don’t use the Chinese Internet, and you will naturally surpass most people.

Check less public accounts, Zhihu, Knowledge Planet, Weibo, and pay close attention to the big guys every day. It will not help us personally. After all, the environments we live in are different. To put it bluntly, we watch what the architects do every day. The problem is that we are on the business side, so what’s the point of watching the price every day? Find someone who is better at business than you and who is better at performance tuning than you to learn from. Don’t exaggerate the level to learn.

Watch less headlines, Douyin, Douyu, spoof series and other videos to prevent falling into the black hole of time.

Don’t listen to gossip, workplace news, social hot topics, controversial topics, etc. These things have nothing to do with us. Instead, we fight with a group of SBs every day to ruin our mood. Recently, Bitcoin has become popular again, and that so-and-so wallet has begun to make profits again. Stop caring about these things. If you make money, you still want to continue. If you lose money, you will feel bad every day. What would I do if I were like that?

Knowledge should be deep, not broad. Don’t pursue new technologies and new knowledge every day. What needs to be learned is lasting. What you feel is progress is your cognition, not knowledge.

The most important thing is to avoid fragmented learning. If you can’t connect the things you learn, what’s the use? If you can’t remember what’s in your head, I suggest you learn to read more. Learn systematically.

Learn to read the title of the article. Some articles look like advertisements at first glance.

Also, technology needs to be put into practice. It’s not like you can learn Dubbo or Netty today and you will be good at it. There is no implementation scenario. What you learn is for interviews, not for learning.

In this article, we will explain to you the three major factors that cause our concurrency problems–orderliness, visibility, and atomicity. These three questions belong to the field of concurrency and do not involve language.

First, let’s talk about what security is.

Whether an object is thread-safe depends on whether it is accessed by multiple threads. If it is a single-thread, it is in a synchronized state to access the variable state of the object, so it must be thread-safe. If there is no coordinated access to mutable state in a multi-threaded situation, it is unsafe.

When multiple threads access a class, the class behaves correctly regardless of the scheduling method used by the runtime environment or how those threads will alternately point to each other, and without any additional synchronization or coordination in the main calling code. behavior, then this class is said to be thread-safe

– “Java Concurrent Programming in Practice”

From the above quote, we know:

Single thread must be thread safe
Stateless objects must be thread-safe

The stateless object mentioned above means that the calculation process of the method only exists in local variables on the thread stack and can only be accessed by the currently executing thread. for example:

private Integer sum(int num) {
        return + + num;
}

Atomicity

Speaking of atomicity, everyone’s first reaction may be i + + and + + i.

Let’s take a look at how jvm executes i++.

public class AtomicIntegerTest {
    public static void main(String[] args) {
        AtomicIntegerTest atomicIntegerTest = new AtomicIntegerTest();
        atomicIntegerTest.sum(10);
    }

    public int sum(int i) {
        i = i + + ;
        return i;
    }
}

Then we use javap to see its decompiled file as follows

Classfile /D:/Work/math-teaching/src/test/java/base/AtomicIntegerTest.class //The current location of the Class file
  Last modified 2019-7-2; size 384 bytes // Last modified time, file size
  MD5 checksum 48f1e270d21b6836df2a88c8545dd2fd // md5 value
  Compiled from "AtomicIntegerTest.java" //Which file is compiled from
public class base.AtomicIntegerTest // Fully qualified name of the class
  minor version: 0 // jdk minor version number
  major version: 52 // jdk major version number.
  flags: ACC_PUBLIC, ACC_SUPER //It is Public type
Constant pool:
   #1 = Methodref #5.#16 // java/lang/Object."<init>":()V
   #2 = Class #17 // base/AtomicIntegerTest
   #3 = Methodref #2.#16 // base/AtomicIntegerTest."<init>":()V
   #4 = Methodref #2.#18 // base/AtomicIntegerTest.sum:(I)I
   #5 = Class #19 // java/lang/Object
   #6 = Utf8 <init>
   #7 = Utf8 ()V
   #8 = Utf8 Code
   #9 = Utf8 LineNumberTable
  #10 = Utf8 main
  #11 = Utf8 ([Ljava/lang/String;)V
  #12 = Utf8 sum
  #13 = Utf8 (I)I
  #14 = Utf8 SourceFile
  #15 = Utf8 AtomicIntegerTest.java
  #16 = NameAndType #6:#7 // "<init>":()V
  #17 = Utf8 base/AtomicIntegerTest
  #18 = NameAndType #12:#13 // sum:(I)I
  #19 = Utf8 java/lang/Object
{
  public base.AtomicIntegerTest();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      // stack is the maximum operand stack. The JVM will allocate the operation stack depth in the stack frame (Frame) based on this value when running. Here it is 1
      // locals: The storage space required for local variables, the unit is Slot. Slot is the smallest unit used by the virtual machine when allocating memory for local variables, which is 4 bytes in size.
      //args_size: The number of method parameters, here is 1, because each instance method will have a hidden parameter this
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1 // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable: //The function of this attribute is to describe the correspondence between the source code line number and the bytecode line number (bytecode offset)
        line 3: 0

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=2, locals=2, args_size=1
         0: new #2 // class base/AtomicIntegerTest
         3: dup
         4: invokespecial #3 // Method "<init>":()V
         7: astore_1
         8: aload_1
         9: bipush 10
        11: invokevirtual #4 // Method sum:(I)I
        14:pop
        15: return
      LineNumberTable:
        line 6: 0
        line 7: 8
        line 8: 15

  public int sum(int);
    descriptor: (I)I
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=2, args_size=2
         0: iload_1 //Push the specified int type local variable to the top of the stack
         1: iinc 1, 1 //Increase the specified int type variable to the specified value (that is, our + + i)
         4: istore_1
         5: iload_1
         6: ireturn //Return int from the current method
      LineNumberTable:
        line 11: 0
        line 12: 5
}
SourceFile: "AtomicIntegerTest.java"

After reading the class file of the appeal, everyone should understand that the operation of i = i + + has three steps.

Read the i variable to the top of the stack
Add i
Deposit

The above situation is OK under single thread, but under multi-threading, a simple i = i + + uses 3 CPU instructions. Due to the operating system thread time slicing, these three instructions will be executed by multiple systems. Bring unexpected changes.

When the operating system was too slow due to IO blocking and other problems, multi-threading appeared. The core of multi-threading is to rotate the CPU for execution, and the time of this execution is the time slice (which is what we usually call task switching time)

Time slice Also known as “quantum” or “processor slice” is a microscopic period of CPU time allocated by the time-sharing operating system to each running process (in preemption In the kernel: the time from the start of the process until it is preempted).

– Wikipedia

In fact, to put it bluntly, in our modern operating systems, multiple processes are allowed to run at the same time, and this is at the same time based on our users’ perception of use, while in the CPU it is constantly switching. After all, the time slice is too short.

The allocation of time slices is assigned to each process by the OS scheduler. Of course, there are also various threads in the process. Then the kernel allocates a considerable initial time slice through this scheduler, and then each process/thread is rotated to execute the corresponding time. As for which process/thread will be executed, the scheduler will judge it through the thread priority and scheduling algorithm. When the time slice is executed, the kernel will recalculate and allocate a new time slice for each process, and so on.

So I talked a lot of nonsense above, but it is actually just one sentence. Because of multi-threading, a Java syntax is divided into 3 CPU instructions. In single-threaded situations, of course it is OK. In the case of multi-threading, because variables must be preempted to execute, they cannot be executed correctly. Of course, if multiple threads know the result of the previous thread’s processing, will it be executed correctly? — Yes. This is the visibility we talk about below.

Visibility

In the process of computer development, because most of our programs need to operate memory, and some programs also need to access IO, such as the reading and writing of our files, the communication of microservices, etc., so hardware devices such as CPU and memory have always been We are constantly upgrading and tuning, but we encounter a problem here, the difference between devices. It is normal for computers to have SSDs and mechanical hard drives. The same CPU with different hard drive devices will have different experience effects. This is where the difference comes in. For example, if the computer has 16G of memory, then even if your CPU is top-notch, but the hard drive is an ordinary mechanical hard drive, your program processing performance will still be limited. After all, IO reading and writing has become a bottleneck. Therefore, you cannot unilaterally improve a certain piece of equipment.

So in order to improve performance, all the big guys came up with various methods, and then we learned how to write multi-threads in a disgusting way:

CPU adds multi-level cache to improve memory speed
The OS adds various exciting technologies such as multi-processing, multi-threading and multiplexing.
Compiler instruction reordering

In the following articles, the knowledge points involved will be more or less explained. After all, everyone is here to learn, not to be a leek

Picture from CSDN

The CPU is divided into three levels of cache. in:

L1 cache

It is built into the CPU and runs at the same speed as the CPU, which can effectively improve the operating efficiency of the CPU. So if this cache is large enough, all the memory data of our program will be stored in it, and it will be as fast as riding a rocket. But just think about it, because this cache is limited by the CPU structure and is generally very small.

L2 cache

This cache is located in temporary storage between the CPU and the memory. It has a smaller capacity than the memory, but the exchange speed is still very fast. The data in the second-level cache is part of the memory data and is the data that the CPU will access immediately in a short period of time. ,

L3 cache

The cache is designed to read the data that is missed in the second-level cache, that is, the data that is not in the second-level cache is stored in the third-level cache (and 5% needs to be read from the memory). The principle is to use a relatively fast storage device to read data from a slow storage device and copy it to the current time, and then read it when needed. It’s similar to what we often call lazy loading in Java.

The first-level cache in the CPU requires 2 to 4 clock cycles, the second-level cache requires about 10 clock cycles, and the third-level cache requires 30 to 40 clock cycles.

Today’s CPUs have all three levels of cache integrated into the CPU chip. Multi-core CPUs usually have dedicated L1 and L2 caches for each core, as well as L3 cache shared between cores.

Back to our Java, for example, the following code

public class AtomicIntegerTest {

    private static long num = 0;

    private void add() {
        for (int i = 0; i < 10000; i + + ) {
            num + = 1;
        }
    }

    public static long calc() throws InterruptedException {
        final AtomicIntegerTest test = new AtomicIntegerTest();
        //Perform add() operation
        Thread th1 = new Thread(() -> test.add());
        Thread th2 = new Thread(() -> test.add());
        //Start thread
        th1.start();
        th2.start();
        // wait
        th1.join();
        th2.join();
        return num;
    }

    public static void main(String[] args) throws InterruptedException {
        calc();
        System.out.println(num);
    }
}

The output results are 11412 and 12581. The results of each run are still different.

In fact, to put it bluntly, th1 and th2 start executing at the same time. The first time, num is read into the cache (we ignore the third-level cache problem here). After executing num + = 1, the respective cache median values are 1, and at the same time When writing to memory, the memory at this time is still 1, not 2 as we expected. That’s the problem with cache visibility.

If the number of cycle competitions is very small, such as 10, then the result may still be correct, but as the number increases, the result will be worse.

Let me explain here, when the thread is running, the data calculation is in the CPU cache, and the thread data is in the memory. The CPU cache is not in the memory. As mentioned above, it is a very small chip. As for when Writing the data in the CPU to the cache depends on the mood of the CPU, so there is a volatile statement in Java to force flushing to the memory. This is the volatile write barrier problem, which is also the so-called happen-before problem. We are here We will continue to talk about it later.

Order

In the jvm we rely on, there is another more problematic issue, which is orderliness, that is, designated reordering. In order to optimize performance, programs sometimes reorder the code bodies in advanced programs, such as

int a = 1;
 int b = 2;

After compilation, it becomes

int b = 2,;
int a = 1;

This only adjusts the statement and does not change the final result of the program.

But in our concurrent programming, something unexpected happened.

First of all, everyone should remember one thing

JMM allows reordering that cannot be deduced from the happens-before principle.

Program sequence rules: Within a thread, according to code order, operations written in the front occur before operations written in the back;

Locking rules: An unLock operation occurs first before facing the same lock operation later;

Volatile variable rules: a write operation to a variable occurs before a subsequent read operation to the variable;

Transition rule: If operation A occurs before operation B, and operation B occurs before operation C, then it can be concluded that operation A occurs before operation C;

Thread startup rules: The start() method of the Thread object occurs first for every action of this thread;

Thread interruption rules: The call to the thread interrupt() method occurs first when the code of the interrupted thread detects the occurrence of the interrupt event;

Thread termination rules: All operations in a thread occur first when the thread is terminated. We can detect that the thread has terminated by ending the Thread.join() method and returning the value of Thread.isAlive();

Object finalization rules: The initialization of an object occurs first at the beginning of its finalize() method;

—-In-depth understanding of Java virtual machine

So what exactly is reordering?

In many cases, accesses to program variables (object instance fields, class static fields, and array elements) may appear to be executed in an order different from that specified by the program. The compiler is free to use optimizations in the order of instructions in the name. The processor may execute instructions out of sequence under certain circumstances. Data can be moved between registers, processor cache, and main memory in an order different from that specified by the program.

For example, if a thread writes to field a and then to field b, and the value of b does not depend on the value of a, the compiler is free to reorder these operations, and the cache is free to flush b to main memory before . There are many potential sources of reordering, such as compilers, JITs, and caches

The compiler, runtime, and hardware should conspire to create the illusion of as-if-serial (if executed serially) semantics, meaning that in a single-threaded program, the program should not be able to observe the effects of reordering.

public class ReorderTest {

    int x = 0, y = 0;

    public void writer() {
        x = 1;
        y = 2;
    }

    public void reader() {
        int r1 = y;
        int r2 = x;
    }

    public static void main(String[] args) {
        ReorderTest reorderTest = new ReorderTest();
        Thread th1 = new Thread(() -> reorderTest.writer());
        Thread th2 = new Thread(() -> reorderTest.reader());
        System.out.println("x = " + reorderTest.x);
        System.out.println("y = " + reorderTest.y);

    }
}

The output of the appeal code is

x = 0
y = 0

For example (infamous) double-checked locking (also known as the multi-threaded singleton pattern) is a technique designed to support lazy initialization while avoiding synchronization overhead.

// double-checked-locking - don't do this!

private static Something instance = null;

public Something getInstance() {
  if (instance == null) {
    synchronized (this) {
      if (instance == null)
        instance = new Something();
    }
  }
  return instance;
}

Assume that there are two threads A and B calling getInstance() at the same time. Then it will first determine whether instance is equal to null. If it is null at this moment, it will start to compete for the lock. If thread A competes for the lock, then the initialization operation will be performed; execute After completion, wake up thread B. At this time, thread B acquires the lock, and then continues to judge instance == null. Thread A has already been initialized, so thread B will return directly at this time. Theoretically, there is no problem.

In fact, the problem occurs in the new Something() method. We think the process should be like this:

Assign a new address to the Something object
Call the Something constructor to initialize the member variables of the new object
Create a reference

But it might actually be like this:

Allocate memory
Assign reference
call constructor

Continuing with the above, if thread A allocates memory and references, but does not call the constructor. At this time, thread B is executing. When it sees that instance is not null, it returns directly, but the call to the constructor is not completed at this time. There is also a happens-before principle here

happens-before

Picture from www.logicbig.com

Happens-before defines a partial ordering of all operations in a program. In order to ensure that the thread executing operation Y can see the result of operation X (whether X and Y appear in different threads), there must be a precedence relationship between X and Y. In the absence of -previous sorting between two operations, the JVM is free to reorder as needed

What happens before Happens-before is not only the reordering of actions in ‘time’, but also the ordering of reads and writes to memory. Two threads performing writes and reads to memory can be consistent with other operations in terms of CPU clock time, but may not see consistent changes to each other (memory consistency errors) unless they are related before.

Single-threaded Rule: Every operation in a single thread occurs before every operation in that thread that occurs later in the program sequence.

Monitor lock rule: Unlocking on a monitor lock (exiting a sync method/block) occurs – before each subsequent acquisition of the same monitor lock.
Volatile variable rule: A write to a volatile field occurs before each subsequent read of the same field. Writing and reading of volatile fields has a similar memory consistency effect as entering and exiting the monitor (synchronized blocks on reads and writes), but without actually acquiring the monitor/lock.
Thread start rules: The Thread.start() call on a thread occurs before starting every operation in the thread. Suppose thread A spawns a new thread B by calling threadA.start(). Everything performed in thread B’s run method will see thread A calling threadA.start() method, which happened before (only in thread A) before them.
Thread connection rules: All operations in a thread occur before any other thread has successfully returned from a connection on that thread. Assume that thread A spawns a new thread B by calling threadA.start(), and then calls threadA.join(). Thread A will wait on the join() call until Thread B’s run method completes. After the join method returns, all subsequent operations in thread A will see that all operations performed in thread B’s run method occurred before them.

Transitivity: If A happens before B and B happens before C, then A happens before C.

The memory model describes possible behaviors of a program. An implementation is free to produce any code it likes, as long as all resulting executions of a program produce a result that can be predicted by the memory model.

This provides a great deal of freedom for the implementor to perform a myriad of code transformations, including the reordering of actions and removal of unnecessary synchronization.

—Java Language Specification -Based on Java8

It can be seen that it is clearly stated in the specification that as long as all results of the program can be predicted by the memory model, then the implementer is free to generate any code it likes. Includes reordering actions and removing unnecessary syncs. And this is also the relationship between Java and its underlying definition through the memory model. It is also the core of write once and run.