The principles of Go language coroutines and the relationship between coroutines and threads! (Very important)

The coroutine of Go language is something unique compared to other languages, and it is also very important. Many times we want to understand its underlying principles, but cannot find suitable knowledge points on the Internet. For example, if you ask GPT, he will answer you like this :

Coroutines and threads are both technologies for executing tasks concurrently, but their implementation methods and characteristics are different.

Threads are a concurrent execution mechanism at the operating system level. Each thread has its own stack and program counter. Switching between threads requires the participation of the operating system. The advantage of threads is that they can use multi-core CPUs to execute tasks in parallel. The disadvantage is that thread switching is relatively expensive and prone to deadlocks, competition and other problems.

Coroutines are user-level concurrent execution mechanisms. The switching of coroutines does not require the participation of the operating system, but is controlled by the program itself. The advantage of coroutines is that the switching cost is relatively low, which can avoid the problem of thread switching. In practical applications, coroutines and threads can be used in combination, and the efficient switching of coroutines can be used to improve the concurrency capabilities of threads, or the multi-core parallel capabilities of threads can be used To improve the performance of coroutines. For example, you can use a thread pool to manage multiple coroutines, allowing each coroutine to execute on a different thread, thereby taking advantage of the parallel capabilities of multi-core CPUs.

I felt a little impressed after watching it, but it seemed like I didn’t say anything. My mind was still foggy, and I couldn’t say anything when I took my eyes off the screen. It is impossible to understand the principles of coroutines just by relying on these few sentences.

Next, I will explain the Go coroutine clearly in the form of asking questions and then answering them. The places that I think are more important are marked with obvious colors.

Why are coroutines more performant than threads?

The coroutine of the Go language is actually a process of squeezing the CPU. The process is the smallest unit of resource allocation. The thread is the smallest unit of scheduling execution. Then the coroutine can be understood as the smallest unit of execution flow (representing the transmission, processing and operation of data).

The operating system divides the memory into two spaces: one is user space (part of the space occupied by the user’s own download) and the other is kernel space (space necessary for some operating systems)

The implementation of new Thread.start (creating thread) in Java is in user mode when written, but when it is run, it is called back to kernel mode at the end, so it will occupy the resources of the operating system.

So what is actually used is the kernel state, and switching consumes resources and has high performance. If it is a highly concurrent code and a large number of switching states, a large amount of resources will be consumed.

So how does the coroutine deal with the problem of high resource consumption when switching threads?

Coroutines are to rename user-level threads (note that user-level threads are different from threads, ordinary threads are kernel threads by default) into coroutines, and add a coroutine scheduler in the middle. -A kernel thread can bind multiple coroutines through the coroutine scheduler. The specific execution task is that each coroutine binds a task. This way, future resource blocking does not require kernel space to trouble the CPU directly through the coroutine. The process scheduler is enough. In the past, the operating system was used to switch between the two threads. The use of the operating system consumes kernel state resources, while the coroutine is switched using the coroutine scheduler, which consumes user state resources. When switching, Explicit control by the programmer.

//You only need to add a coroutine scheduler between the coroutine and the kernel thread.

The resources of the coroutine are obtained from the thread or process. Coroutines are created and managed inside a thread or process, and they share the resources of the same thread or process.

1What are the characteristics of Go coroutine?

1) Have independent stack space

2) Shared program heap space

3) Scheduling is controlled by the user

4) Coroutines are lightweight threads

The meaning of the first two characteristics is: each coroutine can have its own local variables and execution stack without occupying a large amount of system resources, thereby improving the running efficiency of the program to a certain extent. At the same time, due to the shared heap space, coroutines can communicate and cooperate with each other by allocating memory on the heap, thereby achieving efficient concurrent operations. Tens of thousands of coroutines can be easily launched in Go, and they are also very stable.

The latter two mean that switching between coroutines is at the user level, which means it does not consume too much resources and performance of the operating system. The creation, destruction and switching overhead of coroutines are small, and they occupy small memory resources.

We all know that the advantages of coroutines and threads are parallelism. So if you want to understand the relationship and connection between the two, you must first understand what concurrency is? What is parallelism?

Concurrency refers to the execution of multiple tasks alternately within the same time period, through time slice rotation, etc. For example, on a CPU, A executes for 0.001s and B executes for 0.001s. They both execute alternately. In our human eyes, it seems that the two are executed together. But in fact, at a certain moment, only one of A and B is executing.

Parallel Parallel means that multiple tasks are performed at the same time, each executing on an independent processor without affecting each other. At a certain moment, A and B can be executed at the same time.

In other wordsAs long as two processes are run simultaneously on two processors, then they can be called parallel.

It can also be said that to implement concurrent execution of tasks in a multi-core CPU, you need to use multiple processes to execute tasks concurrently, which can be achieved by using goroutine, channel, concurrency model, concurrency source or concurrency library.

Insert a sentence here: Concurrent programming involves access to shared data. Multiple coroutines running at the same time may trigger safe concurrency issues. It is best to protect shared data using mutex locks, atomic operations, or other concurrency primitives.

Here is a typical example of using the go keyword + method in the go language to start two coroutines to execute two functions concurrently.

package main

import (
"fmt"
"time"
)

func printNumbers() {
for i := 1; i <= 5; i + + {
fmt.Println("Printing number:", i)
time.Sleep(500 * time.Millisecond)
}
}

func printLetters() {
for i := 'a'; i <= 'e'; i + + {
fmt.Println("Printing letter:", string(i))
time.Sleep(500 * time.Millisecond)
}
}

func main() {
go printNumbers() // Start the coroutine to concurrently execute the printNumbers function
go printLetters() // Start the coroutine to concurrently execute the printLetters function

//The main coroutine waits for a period of time to observe the output of other coroutines
time.Sleep(3 * time.Second)

fmt.Println("Main goroutine exiting.")
}

Question: The program we wrote uses go func(){} to call coroutines in one program. They do not call multiple coroutines in multiple processes, but call coroutines in one program. Process, is this one program still multiple coroutines?

In fact, no, a program can be understood as a process. They are indeed multiple coroutines written in one program, and they do realize parallel operations. The fundamental reason is another important point: the runtime system of the Go language (Go runtime) – He will be responsible for the scheduling and management of coroutines. He will schedule multiple coroutines in a process to be executed concurrently on multiple operating system threads. Adjusting to multiple operating systems will also support multiple processors, which is consistent with parallelism.

Question: The cost of switching between coroutines is relatively low, so if the coroutines are divided into multiple CPUs in parallel, can they also share resources?

When coroutines execute in parallel on different CPU cores, resources can still be shared between them. In the Go language, coroutines can communicate and share data through shared memory. Such as pipes, shared memory, semaphores, and message queues. However, we must also pay attention to security, such as mutex locks, condition variables, atomic operations, etc.

Question: What is the runtime system of Go language? Where does he exist? When did it start running? , is it called by go func(){}?

The Go language’s Runtime system (Go runtime) is part of the executable file generated by the Go language compiler, which is loaded and run by the operating system when the program starts. The runtime system is responsible for managing underlying tasks such as Goroutine scheduling, memory allocation, and garbage collection to support concurrent and parallel execution models.

The initialization of the runtime system is done automatically by the runtime library, and no explicit code is required to start it. When the program starts, the operating system will load the executable file and transfer control to the runtime system, and then the runtime system will complete the initialization work, including setting up the runtime environment, creating the main Goroutine, etc.

The scheduler is part of the runtime system and is responsible for the scheduling and management of coroutines. The scheduler will decide which coroutine gets the opportunity to execute based on the scheduling algorithm, and switch the execution of the coroutine when necessary.

It should be noted that the behavior of the scheduler is automatically managed by the runtime system, and we do not need to explicitly call or control the operation of the scheduler. The runtime system will automatically schedule and manage according to the needs of the program and the running environment.

In other words, although the coroutine is user-level, the scheduling of the coroutine is not actively selected by the user, but is scheduled by the library in the file in the Go language program selected by the user. This Go runtime is user-level. , so the switching of coroutines scheduled by it does not require the operating system to operate, and it will not occupy too many system resources.

Question: How is it scheduled? What algorithm is used?

The scheduler of the Go language uses a mechanism called the G-M-P model to manage the scheduling and execution of coroutines.

G (Goroutine): Represents the execution context of a coroutine, including stack, instruction pointer and other information. That isGoroutine
M (Machine): Represents the operating system thread and is responsible for the actual execution of the coroutine. An M corresponds to anoperating system thread.
P (Processor): represents the logical processor, responsible for scheduling and managing the execution of coroutines. A P can be associated with multiple M’s. That is, Go runtime can also be understood as the scheduler between threads and coroutines.

The scheduler will allocate multiple coroutines (G) to multiple logical processors (P), and then the logical processor will schedule the coroutines to the corresponding operating system threads (M) for execution. The scheduler switches between coroutines to achieve parallel effects.

The scheduler uses a technique called “work-stealing”. When a coroutine on one operating system thread blocks or takes too long to execute, logical processors on other operating system threads can queue up other coroutines from the queue. Steal tasks to fully utilize system resources.

I will also include pictures below to facilitate understanding

If there is no synchronization mechanism, then two threads call each other to the same resource, which can easily cause the system to be locked, and locking will lead to a waste of system resources. Coroutines can have asynchronous performance and synchronous code logic . To facilitate programmers to operate components on IO

It can create a coroutine to perform other tasks during the waiting period of the io operation (that is, resource blocking, usually switching threads or coroutines at this time). After all, coroutine switching does not cost too many resources. Waiting , after the io operation is completed, the coroutine applies for resources again and continues the unfinished task through explicit recovery. Just like the picture, whichever coroutine is ready will be scheduled.

It can be understood from the picture that the CPU cannot see the coroutine, nor can it see the coroutine scheduler, so the coroutine can be called a user-level thread. Coroutine performance is provided by the connected kernel thread.

GMP Scheduling Model About how go language schedules coroutines

G is the coroutine task we want to process, which is bound to the coroutine one by one. p is used to manage and schedule coroutines but what is actually executed is not p, but It is the lowest level kernel threadp onlyplays ascheduling role

The CPU can only see the kernel thread and can only schedule the coroutine by scheduling the kernel thread. The number of P M G will change according to needs.

The global queue belongs to resource sharing, so it is locked. Generally, M has local queue priority. If local queue is not available, global M is selected. Generally, the global queue will be called after 61 times to ensure that g in the global queue will not be reached. Stuck situation (to prevent blocking)

The strategy has two 1work stealing mechanisms

Assume that the local queue on the right is empty (M is idle). M will go to the global queue to find G. If there is no global queue, it can only go to p on the left to steal. This way the CPU resources will not be wasted and it can work all the time. File execution is also faster.

2 hand off mechanism

That is to say, if M1 and G1 are blocked, then abandon M1 and create M3, connect the CPU and p1 to continue running, and let M1 and G1 continue to be bound (each coroutine has its own independent stack space!) until G1 is unblocked and then Transfer to another p to continue scheduling and running

last: The core of coroutine implementation: jump (coroutine switching)

No matter how the coroutine is created, the bottom layer must allocate execution stack and control information.

When giving up execution rights, the execution scene must be saved for subsequent restoration.

Each coroutine has its own execution stack and can save its own execution scene.

Coroutines can be created on demand by user programs.

When a coroutine “voluntarily gives up” execution rights, the execution scene will be saved and then switched to other coroutines.

When the coroutine resumes execution, it will return to the state before the interruption based on the previously saved execution scene and continue execution. In this way, a lightweight and flexible multi-task model scheduled by the user mode is realized through the coroutine.

Finally, I will put this paragraph of my notes here. I believe you will be able to understand this after reading the previous introduction

Threads: Coroutines are usually implemented based on threads, but unlike traditional threads, coroutines can switch to perform different tasks in the same thread without the need for thread context switching.
Stack: Each coroutine has its own stack (execution stack), which is used to save local variables and function call information when the coroutine is executed. When a coroutine switches, the stack state of the current coroutine will be saved and the stack state of the next coroutine will be restored.
Coroutines are a lightweight concurrent programming model that implements switching of multiple execution streams at the code level and does not rely on thread switching of the operating system. A coroutine can be regarded as a user-mode thread, which is controlled by the programmer and can explicitly switch execution contexts in the code at a low cost.

The relationship between coroutines and threads is collaborative. In some cases, coroutines can be used in conjunction with threads to leverage the strengths of each. For example, multiple coroutines can be executed in one thread (thread created by the system), and concurrent execution can be achieved by switching coroutines; or multiple coroutines can be assigned to multiple threads for execution to achieve true parallel execution. .

The thread is the execution body of the process, and the coroutine is the execution body of the thread! ! !

Coroutines also look like subprograms, but during execution, they can be interrupted inside the subprogram, then switch to executing other subprograms, and then return to continue execution at the appropriate time. understood as subroutine

Coroutines check whether there are resources, and then proceed if there are resources. If there are no resources, they will be blocked. There is no need to lock to waste resources.

Finally, let me add another knowledge point. From what we have learned above, coroutines are controlled by the scheduler, so the performance consumption of the coroutine is closely related to the scheduler, and the performance consumption of the scheduler is given by the thread, so the coroutine is controlled by the scheduler. Threads indirectly control, switching coroutines actually consumes thread resources, then

Why does switching coroutines consume much less resources than switching threads?

Let’s talk about thread switching first

The creation, destruction, and execution of threads are all performed in user space. Code executed by threads in user space is called user-mode code. User-mode code is executed in the context of the application and has access to the application’s memory space and other resources. While thread switching and scheduling are performed in the kernel space (such as making system calls, accessing system resources, or hardware interrupts, etc.), this switching involves switching from user space to kernel space. During the thread switching process, it involves The overhead of switching between user space and kernel space includes context saving and restoration, CPU mode switching, and TLB (Translation Lookaside Buffer) refresh, etc. These switches are very expensive. After processing the response operation, the operating system switches control back to the user mode and restores the context of the thread. The switching process is very expensive.

Let’s talk about coroutine switching

1 Coroutine switching is cooperative: In a multi-threaded environment, thread switching is usually determined and executed by the operating system’s scheduler. This is a preemptive scheduling method (mentioned by the operating system) . The switching of coroutines is collaborative, that is, the execution rights are actively released through explicit switching points within the coroutine, allowing other coroutines to have the opportunity to execute. This cooperative scheduling method avoids unnecessary context switching and scheduling overhead.

2 is that the coroutine is very lightweight, and the context information is relatively small. It only needs to save and restore a small amount of register status to complete the switch, and it has its own independent stack space. The context information of the thread is relatively large and needs to be saved and restored. More register status, stack and other information.