Linux driver development——– Kernel synchronization and mutual exclusion

Interview: Please tell me, what are the synchronization methods between threads?

Multiple threads in the same process share the same address space. In order to avoid the confusion caused by multiple threads accessing data at the same time, it is necessary to consider the synchronization problem between threads. The so-called synchronization refers to collaborative pace and accessing shared resources in a predetermined order to avoid Cause confusion.

There are six main ways to implement thread synchronization: mutex locks, spin locks, read-write locks, condition variables, barriers, and semaphores.

1. Mutex lock. The mutex locks the mutex before accessing the shared resource, and releases the mutex to unlock it after the access is completed. After the mutex is locked, any other thread that attempts to lock the mutex again will be blocked until the current thread releases the mutex.

2. Spin lock. A spin lock is similar to a mutex, but it does not cause the thread to enter a blocking state. Instead, it occupies the CPU and is in a busy waiting spin state before acquiring the lock. Spin locks are suitable for situations where the lock is held for a short time and the thread does not want to spend too much cost on rescheduling.

3. Read-write lock. The read-write lock has three states: read mode lock, write mode lock and no lock. Only one thread can occupy the read-write lock in write mode at a time, but multiple threads can occupy the read-write lock in read mode at the same time. Lock. Read-write locks are very suitable for situations where the number of reads to the data structure is much greater than the number of writes.

4. Condition variables. Condition variables allow a thread to sleep until a certain condition is met. When the condition is met, a signal can be sent to the thread to notify and wake up the thread. Condition variables are often used in conjunction with mutexes. The condition variable is protected by a mutex. The thread must first lock the mutex before changing the condition state. Other threads will not notice the change in the condition before acquiring the mutex because it must lock the mutex before it can Calculate whether the conditions have changed.

5. Barrier. Barrier is a synchronization mechanism for users to coordinate the parallel work of multiple threads. Barriers allow each thread to wait until all cooperating threads have reached a certain point, and then continue execution from that point.

6. Semaphore. Asemaphore is essentially a counter that is used to provide multiple processes with access to a shared data object. When programming, you can determine whether you have access permission to the public resource based on the result of operating the semaphore value. When the semaphore value is greater than 0, you can access it, otherwise it will be blocked. The PV primitive is an operation on the semaphore. A P operation decreases the semaphore by 1, and a V operation increases the semaphore by 1.

Mutual exclusion in the kernel

1. Kernel race conditions and concurrency

【1】Concurrency in the kernel

  • When multiple execution paths in the kernel access the same shared resource at the same time, a race condition will occur.
  • Common shared resources include global variables, static variables, hardware registers, and the same segment of dynamically allocated memory used together.
  • The root cause of the race condition is that the code in the kernel has concurrent (simultaneous) access to shared resources.

【2】What are the concurrency situations in the kernel

  • Hardware interrupt: When the processor enables interrupts, a kernel execution path may be interrupted by an external interrupt at any time
  • Soft interrupts and tasklets: The kernel can execute soft interrupts and tasklets before any hard interrupt is about to return. It may also wake up the soft interrupt processing thread and execute the tasklet.
  • Ordinary multi-process environment: When a process is temporarily unavailable because the resource it is waiting for is temporarily unavailable, it will actively give up the CPU, and the kernel will schedule another process for execution.
  • Multi-process environment that preempts the kernel: If a system call occurs during execution of a process and enters the kernel, the kernel will complete the corresponding operation on behalf of the process. At this time, if a higher priority process is ready Ready, the kernel determines that the current process can be preempted if the preemptible conditions are met, and then executes a higher priority process.
  • Multi-processor or multi-core CPU: At the same time, multiple programs can be executed concurrently on multiple processors. This is true concurrency.

Mutual exclusion methods provided by the kernel

Concurrent access to shared resources will cause a race condition

How to resolve the race? : Mutually exclusive. That is, when one kernel execution path accesses shared resources, other kernel execution paths are not allowed to access the shared resources.

Shared resources are sometimes called Critical resources, and the code that accesses shared resources is also called Critical code segment or critical section.

Some concepts: Critical section, mutex lock, deadlock

Critical section: A code section that may cause problems or concurrency. Add mutual exclusion before and after this code section to protect it.

Mutex lock: Until the thread releases the resource and changes the status of the resource to “unlocked”, other threads can lock the resource again. The mutex lock ensures that only one thread performs writing operations at a time, thereby ensuring the correctness of data in multi-threaded situations.

Deadlock: Deadlock refers to a blocking phenomenon caused by two or more processes competing for resources or communicating with each other during execution. Without external force, they all It will be impossible to proceed. At this time, the system is said to be in a deadlock state or the system has a deadlock. These processes that are always waiting for each other are called deadlock processes.

1. Interrupt masking

【1】Key Points of Interrupt Masking

  • If you mask interrupts before accessing shared resources, then access shared resources, and then re-enable interrupts after accessing shared resources, you can avoid race conditions.
  • It should be noted that if we know clearly which interrupt will cause a race condition, we should usually only block the corresponding interrupt, rather than the global interrupt of the local CPU, so that other interrupts can be executed as usual.
  • If you have to shield the local CPU interrupt, you should try to use the macros local_irq_save and local_irg_restore, because if the interrupt itself is masked before shielding, then local_irq_enable Will mistakenly enable interrupts that are originally masked, thus causing inconsistency in the interrupt enable status.
  • The code between interrupt masking and interrupt re-enabling should not be too long, otherwise the interrupt masking time will affect the performance of the system.

【2】For example, you can mask the interrupt before counting i++, and then re-enable the interrupt. The code is as follows:

unsigned long flags;

i + + ;

【3】Summary when using interrupt mask to do mutual exclusion

  • It is simple and efficient to solve the race condition caused by concurrency caused by interruption.
  • You should try to use local_irq_save and local_irg_restore to mask and enable interrupts
  • Interrupt masking time should not be too long
  • Only local CPU interrupts can be shielded. For multi-CPU systems, interrupts may also be generated on other CPUs.

2. Atomic variables (the operation cannot be interrupted until it is completed.)

【1】Definition of atomic variables
If the operation of a variable is atomic, that is, it cannot be divided, just like in assembly-level code, it can be completed with only one assembly instruction, then access to such a variable does not need to consider the impact of concurrency at all. Therefore, the kernel specifically provides a data type atomic_t, and the variables defined with it are atomic variables, and their type is defined as follows

/*include/linux/types. h */
 typedef struct
int counter;
} atomic_t;

【2】API interface for operating atomic variables
It can be seen from the above that the atomic variable is actually an integer variable. For integer variables, some processors specifically provide some instructions to implement atomic operations (such as the swp instruction in ARM processors). The kernel will use these instructions to access atomic variables, but some The processor does not have such instructions, and the kernel will use other means to ensure the atomicity of access to it, such as interrupt masking.

  • What interfaces does the kernel provide to operate atomic variables. The main APIs are now listed as follows:
atomic_read(v); //Read the value of atomic variable V
atomic_set(v, i); //Set the value of the atomic variable to i

int atomic_add_return(int i,atomic_t *v); //Add i to the atomic variable, _return means that the modified value will also be returned
int atomic_sub_return(int i, atomic_t *v); //Subtract i from the atomic variable, _return means returning the modified value
int atomic_add_negative(int i, atomic_t *v); //Add i to the atomic variable, _negative means returning true when the result is negative

void atomic_add(int i, atomic_t *v); //Add i to the atomic variable
void atomic_sub(int i, atomic_t *v); //Subtract i from the atomic variable
void atomic_inc(atomic_t *v); //increment the atomic variable by 1
void atomic_dec(atomic_t *v); //decrement the atomic variable by 1

atomic_dec_return(v); //_return means returning the modified value
atomic_sub_and_test(i, v); //_test means returning true when the result is 0
atomic_xchg(ptr, v); //Exchange the data pointed to by V and prt pointers
atomic_cmpxchg(v, old, new); //If the value of V is equal to old, set the value of V to NEW and return the original value

void atomic_clear_mask(unsigned long mask, atomic_t *v); //Clear the corresponding bit in V where mask is 1
void atomic_set_mask(unsigned int mask, atomic_t *v); //Set the corresponding position where the mask in V is 1 to 1
void set_bit(int nr, volatile unsigned long *addr); //nr position 1
void clear_bit(int nr, volatile unsigned long *addr); //nr position cleared
void change_bit(int nr, volatile unsigned long *addr); //nr position reversed

int test_and_set_bit(int nr, volatile unsigned long adar); //test_ means to return the original value
int test_and_clear_bit(int nr, volatile unsigned long * addr);
int test_and_change_bit(int nr, volatile unsigned long addr);

【3】Use the following code to ensure atomicity of access to i++

atomic_t i = ATOMIC_INIT(5);
atomic_inc( & amp;i);

【4】This method cannot be used for non-integer variables (such as the entire structure). Atomic variables have less overhead than locking mechanisms.

3. Spin lock (will not hand over the CPU)

【1】Explanation of spin lock

For example, A and B use a public toilet. A reaches the toilet first and gets the spin lock. However, when using the shared resource – the toilet, a call is made to A. A will keep holding the lock for a short period of time. It will be released and will continue to use shared resources until conditions are mature. Just let B keep waiting.

【2】The type of spin lock in the kernel is spinlock_t, and the related API is as follows:

spin_lock_init(_lock); //Initialize the spin lock, which must be initialized before using the spin lock
void spin_lock(spinlock_t *lock); //Acquire the spin lock. If the spin lock cannot be acquired, busy wait
void spin_lock_irq(spinlock_t *lock); //Acquire the spin lock and compete for interrupts
spin_lock_irqsave(lock, flags); //Acquire the spin lock and compete for interrupts, save the interrupt status to flags
void spin_lock_bh(spinlock_t *lock); //Acquire the spin lock and compete for the lower half
int spin_trylock(spinlock_t *lock); //Try to acquire the spin lock. Even if it cannot be acquired, it will return immediately. A return value of 0 indicates that the spin lock was successfully acquired, otherwise the table did not acquire the spin lock.
int spin_trylock_bh(spinlock_t * lock); //Other variants have the same meaning as spin_lock variants.
int spin_trylock_irq(spinlock t *lock);

void spin_unlock(spinlock t *lock); //Release the spin lock, other unlock versions can judge its role based on the previous explanation
void spin_unlock_irq(spinlock_t *lock);
void spin_unlock_irgrestore(spinlock t *lock, unsigned long flags);
void spin_unlock_bh(spinlock_t *lock);

【3】In the example of i++, we can use the following code to make the spin lock mutually exclusive of i operations

int i=5;
/*Define spin lock*/
spinlock_t lock;
/*Variables used to save interrupt mask status*/
unsigned long flags;
/*The spin lock must be initialized before using the spin lock*/
spin_lock init( & amp;lock);
/*Obtain a spin lock before accessing shared resources, disable interrupts, and save the previous interrupt masking status in the f1ags variable*/
spin_lock_irqsave( & amp;lock, flags);
/*To access shared resources*/
i + + ;
/*Release the spin lock after the shared resource access is completed, and use the value of f1ags to restore the interrupt mask state*/
spin_unlock_irgrestore( & amp;lock, flags);

【4】Some important features and usage precautions about spin locks are summarized as follows

  • The execution time of the critical code section to obtain the spin lock should not be too long, because it is a busy waiting lock. If the execution time of the critical code section is too long, it means that other kernel execution paths that want to obtain the lock will be busy waiting for a long time. Will affect the efficiency of the system.
  • The core requirement of a spin lock is: the code that owns the spin lock must not sleep and must hold the CPU until the spin lock is released.
  • During the period of acquiring the lock, you cannot call functions that may cause process switching, because this will increase the time of holding the lock, causing other code that wants to acquire the lock to wait longer. What is worse is that if the newly scheduled The process also needs to acquire the same spin lock, which will lead to deadlock.
  • Spin locks are non-recursive, that is, they cannot acquire spin locks after acquiring the lock, otherwise they will lock themselves waiting for a lock that cannot be acquired.
  • Spin locks can be used in interrupt contexts (semaphores cannot, because they may sleep), but local interrupts must be disabled before acquiring spin locks in interrupt contexts. (Code with spin lock must not sleep, so it cannot be interrupted by other interrupts)

Semaphore: (When the semaphore cannot be obtained, the current process should sleep)

struct semaphore
raw_spinlock_t lock;
unsigned int count;
struct list_head wait listi;
  • As you can see, it has a count member, which is used to record the status of the semaphore resource. When the value of count is not 0, the semaphore can be obtained. When the value of count is 0, the semaphore cannot be obtained. ,This also shows that the semaphore can be held by ,multiple processes at the same time.
  • We also saw a wait_list member. It is not difficult to guess that when the semaphore cannot be obtained, the current process should sleep. Finally, the lock member reminds us that the semaphore actually uses a spin lock mechanism at the bottom level.

The most commonly used API interfaces for semaphores are as follows:

void sema_init(struct semaphore *sem, int val); //Used to initialize the semaphore, val is the initial value assigned to the count member, so that val processes can obtain the semaphore at the same time.
void down(struct semaphore "sem"); //Get the semaphore (the value of the semaphore is minus 1). When the value of the semaphore is not 0, you can get the semaphore immediately, otherwise the process sleeps.
int down_interruptible(struct semaphore *sem); //Same as down, but can be awakened by signals
int down_trylock(struct semaphore *sem); //Just try to obtain the semaphore. If it cannot be obtained, it will return immediately. Returning 0 indicates successful acquisition, and returning 1 indicates acquisition failure.
int down_timeout (struct semaphore *sem,sem, long jiffies); //Same as down, but if the semaphore has not been obtained after jies clock cycles, it will time out and return. Returning 0 means successfully obtaining the semaphore, and returning a negative value means timeout.
void up(struct semaphore *sem); //Release the semaphore (the value of the semaphore is increased by 1). If there are processes waiting for the semaphore, wake up these processes

【4】When assigning an initial value of 1 to a semaphore, it means that only one process can obtain the semaphore at the same time. This semaphore is called a binary semaphore. Mutual exclusion can be achieved by using this semaphore. Typical application code is as follows.

/*Define semaphore*/
struct semaphore sem;

/*Initialize the semaphore, assign the initial value to 1, for mutual exclusion*/
sema_init( & amp;sem, 1);

/*Get the semaphore. If it is awakened by the signal, return an ERESTARTSYS*/
if (down_interruptible( & amp;sem))

/*Access shared resources and perform some time-consuming operations or operations that may cause process scheduling*/

/*After the shared resource access is completed, release the semaphore*/
up(& amp;sem);

【5】The characteristics of semaphores and other usage precautions are summarized as follows:

  • The semaphore can be held by multiple processes at the same time. When the initial value 1 is assigned to the semaphore, the semaphore becomes a binary semaphore, also called a mutually exclusive semaphore, which can be used for mutual exclusion.
  • If the semaphore cannot be obtained, the process will sleep and other processes will be scheduled for execution without busy waiting.
  • Because obtaining a semaphore may cause process switching, it cannot be used in an interrupt context. If it must be used, only down_trylock can be used. However, you can use up in the interrupt context to release the semaphore to wake up other processes.
  • The scheduler can be called while the semaphore is being held, but special attention needs to be paid to whether a deadlock will occur.
  • The overhead of semaphore is relatively large. If the usage rules of spin lock are not violated, spin lock should be used first.

Mutex lock:

【1】Introduction of mutex

  • In addition to not being able to be used in interrupt contexts, semaphores also have the disadvantage that they are not very smart. In the code to obtain the semaphore, as long as the value of the semaphore is 0, the process will sleep immediately.

  • A more general situation is that the semaphore can be obtained immediately after not waiting for too long, so the operation of the semaphore will go through a long process of making the process sleep first and then wake up.

  • When the semaphore cannot be obtained, wait patiently for a short period of time. If the semaphore can be obtained during this period, the operation of obtaining the semaphore can return immediately, otherwise it is not too late to sleep the process.

  • In order to implement this more intelligent semaphore, the kernel provides another high-efficiency semaphore specifically for mutual exclusion, which is a mutex, also called a mutex, and its type is struct_mutex.

【2】Mutex related API

void mutex_lock(struct mutex *lock);
int mutex_lock_interruptible(struct mutex *lock);
int mutex_trylock(struct mutex *lock);
void mutex_unlock(struct mutex *lock);

【3】With the foundation of the previous mutual exclusion operation, it is easy to implement mutual exclusion using a mutex. The sample code is as follows:

int i = 5

/*Define mutex*/
struct mutex lock;

/*Initialize the mutex before use*/
mutex_init( & amp;lock);

/*Obtain a mutex before accessing shared resources*/
mutex_lock( & amp;lock);
i + + ;

/*Release the mutex after accessing the shared resource*/
mutex_unlock( & amp;lock);

【4】The use of mutex is relatively simple, but it has some more restrictions and characteristics. The key points are summarized as follows:

  • The mutex must be locked and unlocked in the same context. For example, it cannot be locked in the reading process and then unlocked in the writing process.
  • Like spin locks, mutex locks cannot be recursed.
  • A process cannot exit while a mutex is held
  • Cannot be used in interrupt context, not even mutex_trylock.
  • While the mutex is held, functions that may cause process switching can be called.
  • Spin locks should be used first when the usage rules of spin locks are not violated.
  • When spin locks cannot be used but do not violate the rules for using mutexes, mutexes should be used first instead of semaphores.

Synchronization in the kernel

1. Amount of completion

【1】After discussing mutual exclusion in the kernel, let’s take a look at synchronization in the kernel

  • Synchronization means that the execution paths in the kernel need to be carried out in a certain order. For example, if execution path A wants to continue execution, execution path B must be executed to a certain point.
  • Taking an ADC device as an example, assuming that one execution path in a driver performs certain conversion operations on the data collected by the ADC (such as averaging several sampling results), and the other execution path is specifically responsible for ADC sampling, then do The execution path of the conversion operation waits for the execution path of the sampling.
  • Synchronization can be achieved using semaphores. Taking the above ADC driver as an example, you can first initialize a semaphore with a value of 0. The execution path of the conversion operation first uses down to obtain the semaphore. If it has not been collected before, data, then the path doing the conversion operation will sleep and wait.
  • When the sampling path completes sampling and calls the release semaphore, the execution path that performs the conversion operation will be awakened. This ensures that sampling occurs before conversion, thus completing synchronization between sampling and conversion.

【2】However, the kernel specifically provides a completion amount to implement this operation. The structure type of the completion amount is defined as follows:

struct completion
unsigned int done; //The status of whether done is completed is a count value, 0 means not completed
wait_queue_head_t wait; //wait is a waiting queue head
  • Recalling the previous knowledge of blocking operations, it is not difficult to think of the working principle of completion volume.
  • When done is 0, the process is blocked. When other execution paths of the kernel make the value of done greater than 0, it is responsible for waking up the process blocked on this completion amount.

【3】The main APs for completion are as follows:

void init_completion(struct completion *x);
wait_for_completion(struct completion *);
wait_for_completion_interruptible(struct completion *x);
unsigned long wait_for_completion_timeout(struct completion *x, unsigned longtimeout);
long wait_for_completion_interruptible_timeout(struct completion *x, unsignedlong timeout);
bool try_wait_for_completion(struct completion *x);
void complete(struct completion *);
void complete_all(struct completion *);
  • With the accumulation of previous knowledge, the functions of the above functions and the meanings of the parameters can be understood clearly, so I will not go into details here.
  • It just needs to be explained that complete wakes up a process, while complete_al wakes up all sleeping processes.

【4】Examples of usage of completion amount are as follows:

/Define completion amount*/
struct completion comp;

/*Initialization completion amount before use*/
init_completion( & amp;comp);

/*Wait for other tasks to complete an operation*/
wait_for_completion( & amp;comp);

/*After an operation is completed, wake up the waiting task*/
complete( & amp;comp); © 2021 All Rights Reserved.