C++11 atomic weight implements spin lock

Linux C/C++ development (backend/audio/video/game/embedded/high-performance network/storage/infrastructure/security) growth system

1. Spin lock

Spin lock is a basic synchronization primitive used to ensure mutually exclusive access to shared data. Compared with the mutex lock, when the lock acquisition fails, the thread will not be blocked but will keep spinning trying to acquire the lock. When the thread is waiting for the spin lock, the CPU cannot do other things, but is always in a polling busy waiting state. Spin locks are mainly suitable for situations where the holding time is short and the thread does not want to spend too much time on rescheduling. In fact, many other types of locks use spin locks at the bottom level. For example, most mutex locks will spin for a short period of time before going to sleep when trying to acquire the lock. If a spin lock is used in a scenario where the lock holding time is very long, the CPU will be consumed in meaningless busy waiting before the thread’s time slice is exhausted, resulting in a waste of computing resources.

2. CAS operation to implement spin lock

CAS (Compare and Swap), that is, compare and replace, is a technology commonly used when implementing concurrent algorithms. This operation provides hardware-level atomic operations (through the lock bus). The prototype of the CAS operation can be thought of as:

bool CAS(V, A, B)

Where V represents the variable in memory, A represents the expected value, and B represents the new value. When the value of V is equal to A, swap the values of V and B. Logically it can be represented by the following pseudo code:

bool CAS(V, A, B)
{<!-- -->
    if (V == A)
    {<!-- -->
        swap(V, B);
        return true;
    }
    
    return false;
}

It should be emphasized that the above operations are atomic, either not done or all completed.

So how to implement a spin lock when you already have a CAS operation? First of all, recall the purpose of the spin lock. Essentially, we want to allow a thread to keep busy waiting for polling when it does not meet the conditions for entering the critical section, and then continue (enter the critical section) until it can run. Then, we may naturally think of using a bool variable to indicate whether the critical section can be entered, such as the following pseudo-code logic:

while(flag == true);
flag = true;
/*
do something...
*/
flag = false;
    ...

The intuitive idea of doing this is that when the flag is true, it means that a thread is already in the critical section. It can only be entered when the flag is fasle, and the flag is immediately set to true when entering. However, there is obviously a problem in doing this. Judging the flag to false and setting the flag to true are not an integral whole. A timing sequence similar to the following may occur. Assume that the flag is false initially:

step	thread 1	thread 2
1	while(flag == true);
2		while (flag == true);
3	flag = true
4		flag = true
5	do something	do something
6		flag = false
7	flag = false

Step is a fictitious step, do something is a series of instructions, written together here to indicate concurrent execution. It can be seen here that because thread1 reads the value of the flag and modifies the value of the flag, they are two independent operations. The judgment operation of thread2 is inserted in the middle. In the end, two threads entered the critical section at the same time, which is contrary to our expectations. . So how to solve it? If the operations of reading judgment and modification can be combined into one and become an indivisible whole, then naturally such interlaced scenes will not be possible. For such an overall operation, we hope that it can read the value of a variable in memory, and when it is equal to a specific value, modify it to another value we need. Hmm…that’s right, so we get the CAS operation.

Now we can re-modify our synchronization method and continue to perform CAS operations CAS(flag, flase, b) (where b is true) that expect flag to be false until it returns successfully, then perform operations in the critical section and leave Set flag to false in critical section.

b = true;
while(!CAS(flag, false, b));
//do something
flag = false;

Now, the judgment operation and the write operation have become a whole. When the CAS operation of one thread is successful, other threads will be prevented from entering the critical section to achieve the purpose of mutually exclusive access.

Now we can use CAS operations to solve the problem of mutually exclusive access to critical sections, but it would be too troublesome to write this every time. Therefore, we can make some encapsulation to make it more convenient to use, that is to say… we can encapsulate it into a self- Twist lock. We can use a class to represent it, use a bool value as the data member of the class, and use the CAS operation and assignment operation as its member function. The CAS operation is actually a locking operation, and the subsequent assignment operation is an unlocking operation.

3. Implemented using C++ atomic weight

According to the above idea, next use C++11 to introduce the atomic weight of the standard library to implement a spin lock and test it.

First, we need a bool value to represent the status of the lock. Here we directly use the atomic quantity atomic in the standard library (for the atomic quantity of C++11, please refer to: https://www.cnblogs.com/FateTHarlaown/p/8919235.html) , on my platform (Cygwin64, GCC7.3), the atomic member function is_lock_free() returns true, which is a lock-free implementation (if a lock is used internally to implement it, what is it called a spin lock==). In fact, atomic is lock-free on most platforms. If you are not sure, you can also use atomic_flag, which is required by the C++ standard to be lock-free.

Next, we need two atomic operations, CAS and assignment, which the C++11 standard library provides directly in the member functions of atomic quantities.

//CAS
std::atomic::compare_exchange_weak( T & expected, T desired,
                                    std::memory_order order =
                                    std::memory_order_seq_cst ),
                                    
std::atomic::compare_exchange_strong( T & expected, T desired,
                                    std::memory_order order =
                                    std::memory_order_seq_cst )
//assignment
void store( T desired, std::memory_order order = std::memory_order_seq_cst )

The main difference between compare_exchange_weak and compare_exchange_strong is whether the CAS operation will definitely succeed when the value in the memory is equal to expected. compare_exchange_weak has a probability of returning failure, while compare_exchange_strong will definitely succeed. Therefore, compare_exchange_weak must be used with a loop to ensure that the CAS operation is retried on failure. The benefit is that compare_exchange_weak performs better on some platforms. According to the above model, we originally wanted to use it with while, so we can use compare_exchange_weak. In the final selection of memory order, the default std::memory_order_seq_cst is used directly without special requirements. The assignment operation is very simple and direct. This call will definitely succeed (it is just an assignment = =) and there is no return value.
The implementation code is very short, the source code is below:

#include <atomic>

class SpinLock {<!-- -->

public:
    SpinLock() : flag_(false)
    {<!-- -->}

    void lock()
    {<!-- -->
        bool expect = false;
        while (!flag_.compare_exchange_weak(expect, true))
        {<!-- -->
            //The expect must be restored here. When the execution fails, the expect result is undetermined.
            expect = false;
        }
    }

    void unlock()
    {<!-- -->
        flag_.store(false);
    }

private:
    std::atomic<bool> flag_;
};

As mentioned above, the lock operation keeps trying the CAS operation until it succeeds, and the unlock operation restores the bool flag. How to use it:

SpinLock myLock;
myLock.lock();

//do something

myLock.unlock();

Next, we conduct a correctness test, taking the classic i++ problem as an example:

#include <atomic>
#include <thread>
#include <vector>

//Spin lock class definition
class SpinLock {<!-- -->

public:
    SpinLock() : flag_(false)
    {<!-- -->}

    void lock()
    {<!-- -->
        bool expect = false;
        while (!flag_.compare_exchange_weak(expect, true))
        {<!-- -->
            expect = false;
        }
    }

    void unlock()
    {<!-- -->
        flag_.store(false);
    }

private:
    std::atomic<bool> flag_;
};

//The number of times each thread increments
const int kIncNum = 1000000;
//Threads
const int kWorkerNum = 10;
//Self-increment counter
int count = 0;
//spin lock
SpinLock spinLock;
//Working function for each thread
void IncCounter()
{<!-- -->
    for (int i = 0; i < kIncNum; + + i)
    {<!-- -->
        spinLock.lock();
        count + + ;
        spinLock.unlock();
    }
}

int main()
{<!-- -->
    std::vector<std::thread> workers;
    std::cout << "SpinLock inc MyTest start" << std::endl;
    count = 0;

    std::cout << "start " << kWorkerNum << " workers_" << "every worker inc " << kIncNum << std::endl;
    std::cout << "count_: " << count << std::endl;
    //Create 10 worker threads for auto-increment operation
    for (int i = 0; i < kWorkerNum; + + i)
        workers.push_back(std::move(std::thread(IncCounter)));

    for (auto it = workers.begin(); it != workers.end(); it + + )
        it->join();

    std::cout << "workers_ end" << std::endl;
    std::cout << "count_: " << count << std::endl;
    //Validation results
    if (count == kIncNum * kWorkerNum)
    {<!-- -->
        std::cout << "SpinLock inc MyTest passed" << std::endl;
        return true;
    }
    else
    {<!-- -->
        std::cout << "SpinLock inc MyTest failed" << std::endl;
        return false;
    }

    return 0;
}

In the above code, 10 threads are created to perform ++ operations on the shared global variable count one million times, and then the results are verified to be correct. The final output of the execution is:

SpinLock inc MyTest start
start 10 workers_every worker inc 1000000
count_: 0
workers_end
count_: 10000000
SpinLock inc MyTest passed

It can be seen from the results that the spin lock we implemented plays a role in protecting the critical section (here is i++). The final value of count is equal to the sum of the incremented numbers executed by each thread. For comparison, you can remove the lock and unlock operation in IncCounter:

void IncCounter()
{<!-- -->
    for (int i = 0; i < kIncNum; + + i)
    {<!-- -->
        //spinLock.lock();
        count + + ;
        //spinLock.unlock();
    }
}

The output after execution is:

SpinLock inc MyTest start
start 10 workers_every worker inc 1000000
count_: 0
workers_end
count_: 7254522
SpinLock inc MyTest failed

The result is incorrect due to multiple threads executing i++ at the same time.

At this point, we have implemented a simple spin lock through the atomic weight of C++11. This is just a small use of C++ atomic weight. There are many things worth exploring whether it is the spin lock itself or the atomic weight.

Compiled some learning books and video materials (Linux C++ backend/audio and video/games/embedded/high-performance network/storage/infrastructure/security
Learning materials, teaching videos and learning roadmaps), if necessary, you can add your own learning exchange group: 739729163 to get it! ! !