Read this article to understand from CPU multi-level cache & cache coherence protocol (MESI) to Java memory model

Article directory

CPU Multi-Level Cache & Cache Coherence Protocol (MESI)
- CPU multi-level cache
- Cache Coherence Protocol (MESI)
- - Cache line
  - Four cache states
  - cache line state transition
  - - Multi-core collaboration example
    - Website experience
  - MESI optimization and introduced issues
  - - Store Buffers & Invalidate Queue
    - - Problems caused by Store Bufferes & Invalidate Queue
  - Hardware memory model
  - - Read Barrier & Write Barrier
  - Think & Connect

CPU multi-level cache & amp; Cache Coherence Protocol (MESI)

CPU multi-level cache

Reference: Java Memory Model

Cache Consistency Protocol (MESI)

The emergence of multi-level cache solves the problem of inconsistency between CPU processing speed and memory reading speed, but it also brings about the problem of cache inconsistency. In order to solve this problem, we introduced the cache consistency protocol. Common cache consistency protocols include MSI , MESI, MOSI, Synapse, Firefly and DragonProtocol, etc., the following will be described in terms of the MESI protocol.

Enterprise WeChat screenshot_16866534797390.png

Cache line

Cache line refers to the smallest unit of data in the cache.

Four cache states

The cache line has 4 states, represented by 2 bits.

Status	Description	Listening task
E Exclusive	The Cache line is valid, the data is modified, and is consistent with the memory data. The data only exists in this Cache	Must monitor all operations that attempt to read the cache line. Operations Must be executed after the cache line is written back to the main memory and the status changes to S
M modification	The Cache line is valid, the data is modified, and The memory data is inconsistent and the data only exists in this Cache	All operations that attempt to read the cache line must be monitored. The operation must be executed after the cache line is written back to the main memory and changes the status to S
S Sharing	The Cache line is valid, the data is consistent with the memory data, and the data exists in multiple caches	Must monitor other caches to make this cache Invalidate or exclusive request for the cache and make the cache line invalid
I invalidate	The Cache line is invalid	None

Note: The M and E states are always accurate, they are consistent with the real state of the cache line, while the S state may be inconsistent. If a cache in the S state is invalidated, another cache line may have exclusive access to the cache line, but it will not be promoted to the exclusive state because the invalidation is not broadcast to other cache lines.

Cache line status transition

Enterprise WeChat screenshot_16866494063738.png

Multi-core collaboration example

Enterprise WeChat screenshot_16866534382085.png

Initial state: CPUB has cache variable X and the state is M
CPUA issues an instruction to read the X command, reads X through the bus, detects an address conflict, sets the CPUB cache variable status to S, and reads
At this time, CPUB modifies the cache variable and writes it back to the main memory through the bus. It finds an address conflict, sets the variable in CPUA from the S state to I, and writes the data back to the main memory.

Website experience

Simulate the entire process of consistency: https://www.scss.tcd.ie/Jeremy.Jones/VivioJS/caches/MESIHelp.htm

MESI optimization and introduced issues

In the process of the above-mentioned multi-core CPUs cooperating to ensure cache consistency, the message delivery time is much longer than the CPU execution time. If each operation needs to wait for the coordination instruction response to be completed, the processing performance of the processor will be greatly reduced. Therefore, Store Bufferes and Invalidate Queue were introduced for optimization.

Store Bufferes & amp; Invalidate Queue

From the above multi-core collaboration case, we can find that every time an element in the cache is modified, the invalid status instruction (Invalidate Acknowledge) needs to be executed before the modified data can be written back to the cache line. Waiting for the collaboration instruction will cause a loss of CPU computing power. Waste, therefore, Store Bufferes were introduced. We can write the modified data to Store Buffers without waiting for the cooperative instruction to return. When reading again, if it already exists in Store Buffers, it will be read directly from the Buffer (called “Store Forwarding”), it can be written back to the cache line only after receiving responses from all co-instructions.
Store Bufferes are limited, so when writing back the cache line, in order to get faster responses to all Invalidate Acknowledge instructions, they will not actually be executed immediately, but will be placed in the Invalidate Queue and the response will be returned immediately. time to execute.

Enterprise WeChat screenshot_16866534797390.png

Problems caused by Store Bufferes & Invalidate Queue

There is no guarantee when the Store buffer will be written back.

value = 3;

void exeToCPUA(){<!-- -->
  value = 10;
  isFinsh = true;
}

void exeToCPUB(){<!-- -->
  if(isFinsh){<!-- -->
    // value must be equal to 10?
    // If Store Bufferes are not written back, data inconsistency will result.
    assert value == 10;
  }
}

There is no guarantee when Invalidate Acknowledge will be executed.

//When a CPU tries to read data that is actually invalid but has not executed Invalidate Acknowledge, it will cause data inconsistency.

Hardware memory model

Due to the introduction of Store Bufferes & Invalidate Queue, the timing of Store Bufferes writing cache lines and executing Invalidate Acknowledge needs to be very appropriate to release the CPU’s processing power as much as possible. In fact, the CPU does not know when it will be executed, so this The task is left to the person who writes the program. This is what we often call the memory barrier.

Read Barrier & amp; Write Barrier

Write Barrier Store Memory Barrier (a.k.a. ST, SMB, smp_wmb) is a command that tells the processor to apply all saved instructions already in the Store buffer to the cache line before executing the following instructions.
Load Memory Barrier (a.k.a. LD, RMB, smp_rmb) is an instruction that tells the processor to apply all invalidation operations that are already in the invalidation queue before performing any loads.

void executedOnCpu0() {<!-- -->
    value = 10;
    // All instructions in the store buffer must be executed before updating data.
    storeMemoryBarrier();
    finished = true;
}
void executedOnCpu1() {<!-- -->
    while(!finished);
    // Execute all instructions related to this data in the invalid queue before reading.
    loadMemoryBarrier();
    assert value == 10;
}

Think & Connect

Different system architectures have different memory barriers. Take the X86 architecture as an example: read barrier: lfence, write barrier: sfence, read and write barrier: mfence.
In order to improve performance as much as possible in the MESI cache consistency protocol, Store Bufferes & Invalidate Queue is introduced, and the specific invalidation time and writing time of the data are handed over to the memory barrier control, while JMM ensures the visibility of the data based on the memory barrier. sex.
The volatile keyword uses the LOCK keyword at the bottom. The essence of the LOCK keyword is a lock (bus lock or cache line lock). Only part of the capabilities of the LOCK keyword have the same effect as a memory barrier, but it is still different from a memory barrier.