[Multi-threading] The concept of threads {heap area management in the Linux kernel; conversion of virtual addresses to physical addresses, pages, page frames, page tables, MMU memory management units; Linux thread concepts, lightweight processes; thread sharing processes Resources; Advantages and Disadvantages of Threads; Purpose of Threads}

1. Supplementary content

1.1 Heap area management in Linux kernel

vm_area_struct (VMA) is a data structure in the Linux kernel, which represents a heap memory area in the process virtual memory space. It is used to track various properties and information about the heap memory area, such as starting and ending addresses, permissions, flags, and associated files or devices.

The following is the definition of the vm_area_struct structure in the Linux kernel source code:

struct vm_area_struct {<!-- -->
    struct mm_struct *vm_mm; /* associated mm_struct */
    unsigned long vm_start; /* starting address */
    unsigned long vm_end; /* end address */
    unsigned long vm_flags; /* Flags of the VMA, such as readable, writable, executable, etc. */
    struct rb_node vm_rb; /* red-black tree node */
    struct list_head vm_list; /* VMA doubly linked list in mm */
    struct vm_area_struct* vm_next, * vm_prev; /*Predecessor and successor nodes in the doubly linked list*/
    pgprot_t vm_page_prot; /* Page protection flag */
    struct vm_operations_struct *vm_ops; /* VMA operations, used to process heap area operations*/
    unsigned long vm_pgoff; /* Offset within file/device */
    struct file *vm_file; /* associated files */
    void *vm_private_data; /* VMA private data */
};

In the virtual memory space of the process, the heap area is usually a continuous memory area, but it may be divided into multiple vm_area_struct (VMA) structures for management.

When a process uses functions such as malloc() to allocate memory, the kernel will create one or more vm_area_struct structures as needed to manage these allocated memory areas. Each vm_area_struct structure corresponds to a memory segment in the heap area, and it contains the start address, end address, flag and other information of the memory segment.

This way of dividing the heap area can improve the flexibility and efficiency of memory management. For example, when a process releases part of its memory, the scope of the memory segment can be reduced by modifying the vm_start and vm_end fields in the VMA structure to reflect the released memory space. There is no need to remap the entire heap area. If the freed memory block is aligned with a memory segment boundary, adjacent VMA structures may be merged to reduce memory fragmentation.

Through the vm_area_struct structure, the kernel can effectively manage the heap area of the process, including allocating and releasing memory, protecting memory areas, processing memory mapping and other operations.

Supplement: When a new vm_area_struct structure is created, the kernel will insert it into the mm_rb red-black tree so that the corresponding VMA can be quickly found and accessed through the starting address. . At the same time, the kernel will also insert it at the end of the mmap doubly linked list to maintain the creation order of VMAs.

1.2 Conversion of virtual address to physical address

1.2.1 ELF file format

ELF is a universal executable program format widely used in Linux and many other UNIX-like operating systems.
The executable program itself is compiled according to the virtual address (logical address). The final ELF file divides the executable file into multiple segments, including code segments, data segments, symbol tables, etc. Each segment has its own attributes and corresponding memory area.
When running the program, the code and data are loaded into memory, and what is actually loaded is the ELF file.

1.2.2 Pages, page frames and page tables

Page, Page Frame and Page Table are basic concepts related to memory management.

Page: A page is a basic unit in memory management, usually a fixed-size continuous memory block. Common page sizes are 4KB, 8KB or larger. Physical memory is divided into a series of pages, each page has a unique physical address.
Page Frame: The page frame is a basic unit in physical memory and is the same size as the page. Page frame is a division unit of physical memory and is used to store the contents of pages. The operating system divides physical memory into a series of page frames, and each page frame has a unique physical address (the starting address of the page frame).
Page frame: The division unit of executable program storage space (disk space), which is the same as the page size. In fact, it is the size of a data block (block) in the file system. The basic unit for IO operations by the operating system and disk is 4KB. When the source code is compiled, it will be compiled into a two-process executable program in 4KB units in a specific format (elf).
Page Table: The page table is a data structure. Each process has its own page table, which is used to manage the mapping relationship between virtual addresses and physical memory. The page table does not directly record the one-to-one mapping between virtual addresses and physical addresses, but records the mapping between virtual addresses and physical pages through multi-level page tables. offset to the physical address).

hint:

In the Linux kernel, struct page is a data structure used to describe the physical page frame. Each physical page frame corresponds to a struct page object, which is used to manage and track the status and properties of the page frame. Including whether the page frame is occupied, the reference count of the page frame, the address space to which the page frame belongs, etc. struct page is the metadata structure of physical memory!

The struct list_head lru field in struct page is used to link the page frame to the LRU (Least Recently Used) linked list to implement the page replacement algorithm . Through the lru field, the page frame can be linked to an active linked list or an inactive linked list.

1.2.3 MMU memory management unit

MMU is the abbreviation of Memory Management Unit, which is a hardware component in the computer system. The MMU is responsible for the conversion of virtual addresses to physical addresses and the control of memory access permissions.

MMU is usually integrated in the CPU or a separate chip. It is an important part of the operating system’s memory management. The main functions of MMU include:

Address translation: MMU searches the page table based on the high-order part of the virtual address (page table index) and converts the virtual address into a physical address. This process usually includes page table lookup and calculation of intra-page offset.
Memory protection: MMU checks the permissions of the accessed virtual address based on the permission bits in the page table. If the access rights do not meet the requirements, the MMU will generate an exception and interrupt the execution of the program.
Page fault processing: When the virtual page accessed by the program is not in the physical memory, the MMU will generate a page fault exception. The operating system will read the missing pages from disk to physical memory according to the exception handler, and update the mapping relationship of the page table.
Page replacement: When the physical memory is insufficient, the MMU will replace some pages from the physical memory to the disk according to the page replacement algorithm (such as LRU) to release memory space.

The existence of MMU allows the operating system to map the virtual address space to physical memory, providing a larger address space and higher flexibility. At the same time, the MMU also plays a role in protecting the memory and preventing programs from accessing the memory out of bounds or illegally.

1.2.4 Conversion of virtual address to physical address

The mapping of virtual addresses to physical memory is done by the operating system’s memory management unit (MMU). The MMU is responsible for converting virtual addresses into physical addresses.

The mapping process from virtual address to physical address is as follows:

When a program accesses a virtual address, the CPU sends the virtual address to the MMU.
The MMU looks up the page table (page table) based on the high-order part of the virtual address (page table index).
The page table stores the mapping relationship between virtual page numbers and physical page frame numbers. The MMU finds the corresponding physical page frame number based on the virtual page number.
The MMU combines the physical page frame number with the low-order part of the virtual address (intra-page offset) to obtain the physical address.
The MMU sends the physical address to the memory controller to read or write data from the physical memory.

It should be noted that the virtual address space can be larger than the physical memory space. In this case, a page replacement algorithm (such as LRU) will be used to replace some virtual pages on the disk to free up the physical memory space. When a program accesses a virtual page that is replaced on the disk, a page fault exception will be triggered. The operating system will read the virtual page from the disk into physical memory and update the mapping relationship of the page table.

Tip: The low part of the virtual address (in-page offset) is exactly 12 bits, 2^12 = 4KB, and the in-page offset code can just cover the entire page frame.

Through the supplementary introduction of the above concepts, we can feel that everything from upper-level development to compilation principles to the operating system are strongly related, and they are all carefully designed and coordinated with each other.

2. Linux thread concept

2.1 From the user’s perspective

A process is an instance of a program that is running on your computer. Each process has its own address space, memory block, file descriptor and other resources (kernel data structure + memory block), as well as all threads. Process is the basic unit of resource allocation by the operating system. Processes are independent of each other and exchange data and collaborate through the inter-process communication (IPC) mechanism.
A thread is an execution stream (or execution unit) in a process and is the actual unit of work in the process. A process must have at least one thread, and may also contain multiple threads. They share the resources of the process, such as memory, files, etc. Thread is the basic unit of operating system scheduling (CPU execution). Threads can be executed concurrently, improving the concurrency and responsiveness of the program. Threads communicate and synchronize through shared memory.

The main differences between processes and threads are as follows:

Resource overhead: Switching between processes is more expensive, and the context information of the entire process needs to be saved and restored; while switching between threads is less expensive, and only the context information of the thread needs to be saved and restored.
Independence: A process is an independent execution entity with independent address space and resources; while a thread is a subset of a process and shares the resources of the process.
Communication and synchronization: Communication and synchronization between processes require the use of inter-process communication (IPC) mechanisms, such as pipes, message queues, shared memory, etc.; while communication and synchronization between threads can be achieved directly through shared memory, which is more convenient and efficient.
Creation and management: In user space, processes and threads can be created and managed through different APIs (such as fork() and pthread_create()).

Tip: We have been writing single-threaded processes before, and now we are going to study multi-threaded processes!

2.2 From the perspective of Linux kernel

In Linux, threads are part of the process, also known as lightweight processes (LWP, Lightweight Process). Linux uses a model called “multi-threads sharing the same process address space”, that is, multiple threads share the resources of the same process, such as memory, file descriptors, etc.

In Linux systems, the structures of processes and threads are the same. In the kernel, processes and threads are represented by task_struct structures.

The task_struct structure contains various attributes and status information of the process or thread, such as process ID (PID), parent process ID (PPID), process status, process priority, process address space, and files. Descriptor table, thread group of process, signal processing structure of process, etc. It also contains some pointers for connecting to process or thread related data structures, such as the process’s child process linked list, the process’s thread linked list, etc.

In Linux, a thread is part of a process, and multiple threads share the resources of the same process, including address space, file descriptors, etc. At the same time, from the perspective of the kernel, the structures of processes and threads are the same, and are represented by the task_struct structure. Therefore, threads in Linux systems are also called lightweight processes (LWP).

In the Linux system, the difference between threads and processes is relatively small, so Linux does not directly provide us with thread-related system calls, but uniformly provides lightweight process interfaces. However, in order to reduce the user’s difficulty in using and learning, Linux encapsulates a set of multi-threading solutions at the user layer and provides them to users in the form of libraries.

In Linux, the pthread library (also called POSIX thread library and native thread library) can be used to create and manage threads.

Tip: In Windows systems, processes and threads are two different concepts, and there are some structural differences. Each process has an independent Process Control Block (PCB), and each thread also has an independent Thread Control Block (TCB).

2.3 Lightweight Process

Linux threads are also called lightweight processes for the following reasons:

Create lightweight
- When creating a thread, you only need to create the task_struct structure. There is no need to create kernel data structures such as address spaces, page tables, file descriptor tables, or load memory blocks. The creation and application of process resources are performed when the process is created, and multiple threads share the resources of the process.
Lightweight during scheduling
1. The cost of thread switching is low because threads share the resources of the same process, including address space, page tables, etc. Compared with process switching, thread switching does not require switching the context (register) of resources such as address space and page table, so the overhead is smaller.
2. Another important reason for the low cost of thread switching: during program running, the CPU will pre-read the code and data in the memory into the CPU cache (L1~L3 catch) based on the principle of locality. If it is a process switch, the CPU cache will immediately become invalid and the hotspot data needs to be re-cached. If it is a thread switch, the cache hit rate is higher and there is no need to re-cache the data.
Lightweight when deleting
- When a thread is deleted, it only needs to delete its task_struct structure and does not need to release the resources of the process. The release and recycling of process resources are performed when the process exits.

2.4 Test program

The following is a simple sample code that demonstrates how to use the pthread_create() function to create a thread (thread control is discussed in detail in the next chapter):

void *ThreadRun(void *name)
{<!-- -->
    //Print the PID of the new thread
    printf("%s:pid:%d\\
", (char *)name, getpid());
    while (1)
        sleep(1);
}

int main()
{<!-- -->
    //Print the PID of the main thread
    printf("%s:pid:%d\\
", "main thread", getpid());
    pthread_t tid[5];
    char name[50];
    for (int i = 0; i < 5; + + i)
    {<!-- -->
        snprintf(name, sizeof(name), "%s-%d", "thread", i);
        //Loop to create new thread
        pthread_create(tid + i, nullptr, ThreadRun, (void *)name);
        sleep(1);
    }
    while (1)
        sleep(1);

    return 0;
}

The PID of the main thread and the new thread are the same, which proves that the thread is part of the process and an execution flow (execution unit) of the process.
Only one mythread process appears in the process monitoring window (ps axj), proving that these six threads belong to the same process.
A total of 6 threads appeared in the lightweight process monitoring window (ps -aL), 1 main thread (the same PID and LWP), and 5 new threads (different PID and LWP).
Signal No. 9 is sent to the process and all threads are terminated. Because a process is an instance of a running program and is the basic unit of resource allocation by the OS. All threads share the resources of the process. So the process exits and the thread must exit.

3. Threads share process resources

Each thread shares the address space of the process, including:

Code area data (define a function that can be called in each thread)
Static area data (define a global variable that can be accessed in each thread)
Heap area data (the pointer of the heap space can be passed between threads, or private heap space can be selected)
Shared area data (dynamic library and shared memory communication)
Command line parameters and environment variables

Each thread also shares the resources and environment of the following processes:

file descriptor table
Each signal processing method (SIG_ IGN, SIG_ DFL or custom signal processing function)
current working directory
user id and group id

At the same time, the thread also has its own part of data:

Thread ID (thread attribute structure)
Independent stack structure (thread attribute structure): an important basis for independent execution of threads
errno error code (thread local variable, thread attribute structure)
Thread context (a set of registers, PCB data): an important basis for independent thread scheduling
Signal mask word (PCB data)
Scheduling priority (PCB data)

hint:

Two important private data: thread context data (thread scheduling) and stack structure data (calling functions, opening up stack frame space, storing temporary data), they reflect the dynamic attributes of the thread.

Regarding the thread attribute structure, it will be explained in the “Thread ID” section of the next chapter “Thread Control”.

4. Advantages and Disadvantages of Threads

Advantages of threads

Creating a new thread is much less expensive than creating a new process
Compared with switching between processes, switching between threads requires the operating system to do much less work.
Threads occupy much fewer resources than processes
It can make full use of the parallel number of multi-processors. Generally, the number of threads created by a process is the same as the number of cores of the CPU.
While waiting for the slow I/O operation to complete, the program can perform other computing tasks
For computationally intensive applications, in order to run on a multi-processor system, the calculations are broken down into multiple threads.
In I/O-intensive applications, in order to improve performance, I/O operations are overlapped. Threads can wait for different I/O operations at the same time.

Disadvantages of threads

Performance penalty: A computationally intensive thread that is rarely blocked by external events often cannot share the same processor with other threads. If the number of compute-intensive threads exceeds the available processors, there may be a large performance loss, where the performance loss refers to the addition of additional synchronization and scheduling overhead, while the available resources remain unchanged.
Reduced robustness: Writing multi-threaded programs requires more comprehensive and in-depth considerations. In a multi-threaded program, the possibility of adverse effects due to subtle deviations in time allocation or sharing of variables that should not be shared is very high. , in other words, there is a lack of protection between threads.
Lack of access control: In multi-threaded programming, there are challenges with access control. Since multiple threads can access shared data and resources simultaneously, appropriate measures need to be taken to ensure safe access between threads.
Increased difficulty of programming: Writing and debugging a multi-threaded program is much more difficult than a single-threaded program

5. Purpose of threads

Reasonable use of multi-threading can improve the execution efficiency of computationally intensive programs.
Reasonable use of multi-threading can improve the user experience of IO-intensive programs (for example, the simultaneous broadcast function is a manifestation of multi-threaded operation)