User mode –fork function to create a process

We generally use the Shell command line to start a program, which first creates a subprocess. However, because the Shell command line program is relatively complicated, we simplify the Shell command line program for easier understanding, and use the following small piece of code to see how to create a subprocess in user mode.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>


int main(int argc, char * argv[])
{
    int pid;
    /* fork another process */
    pid = fork();
    if (pid < 0)
    {
        /* error occurred */
    }
    else if (pid == 0)
    {
        /* child process */
    }
    else
    {
        /* parent process */
    }
}

Library function fork

The library function fork is a system call API interface for creating a child process in user mode. For judging the return value of the fork function, we may be very confused, because after the fork is executed normally, except for if (pid < 0) exception handling is not executed in the if condition judgment, the else if (pid == 0) and else two paragraphs The code is executed.

In fact, the fork system call copies the current process into a child process, that is, one process becomes two processes, and the two processes execute the same code, but the return value of the fork system call in the parent process and the child process is different. In fact, the if statement is executed once in each of the two processes. Since the judgment conditions are different, the output information is also different. The parent process does not break the conditional branch structure of if else, nor does it break this structure in the child process, but it seems that both are output under the Shell command line, which seems to break the conditional branch structure, but in fact there are two processes behind it. After fork, the execution order of the parent-child process is closely related to the scheduling algorithm, and sometimes it can be seen that the execution order of the parent-child process is not deterministic after multiple executions.

Through the above fork code program, we can create a child process in user mode, which is to call the system call fork.

Let’s start by reviewing how system calls work, and discuss how creating a process differs from other common system calls.

Review of system calls

When the system call is triggered normally, for the X86 Linux system, there is an int $0x80 or syscall instruction in the user mode to trigger the system call, and the CPU jumps to the assembly code of the system call entry for execution. The int $0x80 instruction triggers entry_INT80_32 and returns the system call with iret, and the syscall instruction triggers entry_SYSCALL_64 and returns the system call with sysret or iret.

For the ARM64 Linux system, the user mode program will execute the svc instruction to trigger the system call, and the CPU will jump to the exception vector table (vectors) for execution, and then enter the exception handling entry, that is, jump to el0_sync and el0_svc after the svc instruction, and execute After the system call is completed, return to the system call with the eret command.

When the system call falls into the kernel state from the user state, the function call stack used is also converted from the user state stack to the kernel state stack, and then the corresponding CPU key field stack top register, instruction pointer register, flag register, etc. are saved to the kernel stack, save the scene. The assembly code of the system call entry will also execute the system call kernel processing function through the system call number, and finally restore the site and system call return, and restore the CPU key site stack top register, instruction pointer register, flag register, etc. from the kernel stack to the corresponding registers , and return to the position of the next instruction after the user state int $0x80/syscall or svc instruction (system call return address) to continue execution.

fork system call

Fork is also a system call, which is roughly the same as the aforementioned general system call execution process. Especially from the perspective of the parent process, the execution process of fork is exactly the same as the previous description, but the problem is: the fork system call creates a child process, and the child process copies all the process information in the parent process, including the kernel stack, process description character, etc., the child process will also be scheduled as an independent process. When the child process gets the CPU to start running, where does it start running?

From the perspective of user space, it is the next instruction of the fork system call. But the fork system call is also returned in the child process, that is to say, the fork system call becomes a father and son process in the kernel. Return to user mode. So for the child process, where does the fork system call start to execute in the kernel handler? From which line of code is a newly created child process executed? This is a key issue. Let’s take this question to carefully analyze the kernel processing process of the fork system call. Solving this question is believed to give a deeper understanding of the Linux kernel source code.

The main process of process creation

Let’s first look at how to correctly establish the framework of a process. We learned earlier that creating a process is to copy the information of the current process, that is, a new process is created through the _do_fork function. Because most of the information of the parent process and the child process are exactly the same, but some information is not the same, such as the value of pid and the kernel stack. Also link the new process to various linked lists. To save where the process is executed, there is a thread data structure to record the key information of the process execution context. Otherwise, problems will occur.

It is conceivable that such a framework, when the parent process creates a child process, there should be a place where the process descriptor task_struct structure variable of the parent process is copied, and there are many places to modify the copied process descriptor task_struct structure variable. Because the parent and child processes each have a lot of independence, the child process should have many places to modify the information in the kernel stack, because a lot of data in the kernel stack is copied from the parent process, and the fork system call returns in the parent and child processes respectively. In the user mode, some information in the kernel stack of the parent and child processes may not be exactly the same. There is also thread, according to the status of the kernel stack of the parent process copied by the child process, the instruction pointer and the top register of the stack must be set, that is, the position where the child process starts to execute must be set.

It should be noted that in the process of forking a child process, the Copy On Write (copy-on-write) technology is used when copying the resources of the parent process, and the process resources that do not need to be modified are shared by the parent and child processes.

With this framework idea, you can track the specific code execution process and find the relevant information that needs to be understood in this framework idea. In order to avoid repetition, the process of triggering the fork system call will not be repeated here, but the code is tracked and analyzed directly from the _do_fork function. The specific code is as follows in kernel/fork.c.

_do_fork function

The _do_fork function mainly completes calling copy_process() to copy the parent process, obtaining the pid, calling wake_up_new_task to add the child process to the ready queue and waiting for scheduling execution.

long _do_fork(struct kernel_clone_args *args)
{
    //Copy the process descriptor and other data structures needed for execution
    p = copy_process(NULL, trace, NUMA_NO_NODE, args);

    wake_up_new_task(p);//Add the child process to the ready queue

    return nr;//Return the pid of the child process (the return value of fork in the parent process is the pid of the child process)
}

How the copy_process() function copies the parent process

copy_process() is the main code to create a process. The following copy_process() function code has been deleted and some Chinese comments have been added. See kernel/fork.c for the complete code.

static __latent_entropy struct task_struct *copy_process(
                    struct pid *pid,
                    int trace,
                    int node,
                    struct kernel_clone_args *args)
{
    //Copy process descriptor task_struct, create kernel stack, etc.
    p = dup_task_struct(current, node);

    /* copy all the process information */
    shm_init_task(p);
    …
    // Initialize the child process kernel stack and thread
    retval = copy_thread_tls(clone_flags, args->stack, args->stack_size, p,
                 args->tls);
    …
    return p;//Return the created child process descriptor pointer
}

The copy_process function mainly completes calling dup_task_struct to copy the current process (parent process) descriptor task_struct, information checking, initialization, setting the process state to TASK_RUNNING (at this time, the child process is set to the ready state), using the write time The copy technology copies all other process resources one by one, calls copy_thread_tls to initialize the child process kernel stack, sets the child process pid, etc.. The most critical of these is dup_task_struct to copy the current process (parent process) descriptor task_struct and copy_thread_tls to initialize the child process kernel stack. Next, look at dup_task_struct and copy_thread_tls in detail.

dup_task_structCopy the current process (parent process) descriptor task_struct

static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
{
…
        //Actually complete the copy of the process descriptor, the specific method is *tsk = *orig
        err = arch_dup_task_struct(tsk, orig);
…
        tsk->stack = stack;
...
        //Actually complete the copy of the process descriptor, the specific method is *tsk = *orig
        setup_thread_stack(tsk, orig);
        clear_user_return_notifier(tsk);
        clear_tsk_need_resched(tsk);
        set_task_stack_end_magic(tsk);
...
        return ts
}

Also copy_thread_tls is a key. In the early version 3.18.6, the function is called copy_thread, which is responsible for constructing the kernel stack of the fork system call in the child process, that is, the fork system call returns once in the parent and child processes, and other system calls in the parent process The processing process is the same, but the kernel function call stack in the child process needs to be specially constructed to prepare the context for the operation of the child process. In addition, thread local storage TLS (thread local storage) is introduced to support multi-threaded programming, we will not go into it.

Before looking at copy_thread_tls, we need to focus on the kernel stack of the fork child process and the last member of the process descriptor, struct thread_struct thread.

At the end of the task_struct data structure is the key data structure thread that saves some state information related to the CPU in the process context. The code of the structure variable thread defined by struct thread_struct at the end of the process descriptor is as follows:

 /* CPU-specific state of this task: */
    struct thread_struct thread;

There are still many things inside this struct thread_struct data structure, the most critical of which are sp and ip. In the 32-bit Linux kernel 3.18.6 under x86, sp is used to save the state of the ESP register in the process context, and ip is used to save the state of the EIP register in the process context; there are many other CPU-related states in the data structure.

What needs special explanation is that in the 5.4.34 code, there is no ip in the struct thread_struct data structure, but the ip is saved through the kernel stack. For example, there will be a ret_addr in the kernel stack of the child process created by fork.

After understanding the kernel stack of the fork child process and the last member struct thread_struct thread of the process descriptor, we need to focus on copy_thread_tls (take 5.4.34 as an example) and copy_thread (take 3.18.6 as an example).

copy_thread_tls vs. copy_thread

int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
        unsigned long arg, struct task_struct *p, unsigned long tls)
{

    frame->ret_addr = (unsigned long) ret_from_fork;
    p->thread.sp = (unsigned long) fork_frame;
  
    *childregs = *current_pt_regs();


    childregs->ax = 0;

...
    /*
     * Set a new TLS for the child thread?
     */
    if (clone_flags & CLONE_SETTLS) {
            err = do_arch_prctl_64(p, ARCH_SET_FS, tls);
            ?…
int copy_thread(unsigned long clone_flags, unsigned long sp,
    unsigned long arg, struct task_struct *p)
{

    p->thread.sp = (unsigned long) childregs;

    //Copy the kernel stack (copy the register information of the parent process, that is, the part of the system call int instruction and SAVE_ALL pushed onto the stack)
    *childregs = *current_pt_regs();
    
    childregs->ax = 0; //Set the eax of the child process to 0, so the return value of the fork child process is 0
    ...
    //ip points to ret_from_fork, the child process starts to execute from here
    p->thread.ip = (unsigned long) ret_from_fork;
 
    ...

After the child process has created the process descriptor, kernel stack, etc., the child process can be added to the ready queue through wake_up_new_task(p), so that it has a chance to be scheduled for execution. The creation of the process is completed, and the child process can wait for scheduling Execution, the execution of the child process starts from the ret_from_fork set here.

It is worth noting that the ip and sp of the key context of the process, linux-5.4.34 is different from the earlier version, mainly because the instruction pointer ip is stored in thread.ip in version 3.18.6, while in 5.4.34 it is passed frame->ret_addr is stored directly in the kernel stack.

Summary of _do_fork

To sum up, the creation process of a process is roughly that the parent process enters the kernel _do_fork function through the fork system call, copies the process descriptor and related process resources (using copy-on-write technology), allocates the kernel stack of the child process, and The process key contexts such as the kernel stack and thread are initialized, and finally the child process is put into the ready queue, and the fork system call returns; while the child process starts executing according to the set process key context such as the kernel stack and thread when it is scheduled for execution.

The above content is the after-class summary of “Linux Operating System Analysis” of the School of Software, University of Science and Technology of China. Thanks to Mr. Meng Ning for his devoted professor, the teacher’s lecture is very good (^_^)

Reference: “Analysis of Paodingjieniu Linux Kernel” edited by Meng Ning

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. CS introductory skill tree Introduction to LinuxFirst acquaintance with Linux 28839 People are studying systematically