[Linux] Process Concept III –fork function analysis

img

Halo, this is Ppeua. I usually update C language, C++, data structure algorithms… If you are interested, please follow me! You won’t be disappointed.

Navigation of this article

  • 0. Create process
  • 1. Understand the fork function
  • 2. Use the Fork function
  • 3. Why about fork?
    • 3.1 How does a function return twice? What exactly does fork do?
    • 3.2 Why should 0 be returned to the child process and the parent process return the PID of the child process?
    • 3.3 Why does a variable have two different contents?
  • 4. Bash and subprocesses


0. Create process

We introduced that every task running in memory is called a process. So how do we create a process ourselves?

In the past, we could run a program through ./xxx, so that the program would be loaded into the memory and become a process. This is the creation of a process at the instruction level.

For example:

Create a program named proce and run it. Then enter

ps -ajx | grep proce

image-20231103152448136

You can view relevant information about the current process.You can see that its PID is 58014 and PPID is 57120

In the code we can pass

getpid() getppid() to get the PID and PPID of the current process, which is included in ** **, and the return value is: pid_t type

You can see the relevant introduction to it in the man manual No. 2, indicating that it is a system call interface.

Common section numbers include:

  • 1: User command
  • 2: System call
  • 3: C library function
  • 4: Devices and special files
  • 5: File formats and conventions
  • 6: Games and Demos
  • 7: Miscellaneous
  • 8: System management commands

Screenshot 2023-11-03 213748

image-20231103213847667

So how do we create a process in code?

1. Understand the fork function

At the code level, we can use fork to create our child processes.

Let’s first take a brief look at the interface description of this function. You can see the relevant introduction in the man manual No. 2. It shows that fork is a system call interface.

Common section numbers include:

  • 1: User command
  • 2: System call
  • 3: C library function
  • 4: Devices and special files
  • 5: File formats and conventions
  • 6: Games and Demos
  • 7: Miscellaneous
  • 8: System management commands
man fork

Screenshot 2023-11-03 210854

image-20231103210919090

Function interface:

pid_t fork(void)

head File:

#include <sys/types.h>
#include <unistd.h>

return value:

? If the child process is successfully created, the child process ID will be returned to the parent process and 0 will be returned to the child process

Let’s try using the fork function

2. Use the Fork function

Let’s write a simple code first

#include <stdio.h>
#include <unistd.h>
int main()
{<!-- -->
    printf("before:\
");
    pid_t id=fork();
    printf("after:\
");
    sleep(1);
  return 0;
}

After compiling and running, I was surprised to find that before was printed once and after was printed twice.

image-20231103212759610

We can draw a conclusion:The code segment is executed twice after the fork

Let’s take a look at this code again:

#include <stdio.h>
#include <unistd.h>
int main()
{<!-- -->
    printf("pid: %d\
",getpid());
    pid_t id=fork();
    if(id>0)
    {<!-- -->
        while(1)
        {<!-- -->
            printf("my pid : %d , my parent :%d,id:%d\
, ",getpid(),getppid(),id);
            sleep(1);
        }
    }
    else if(id==0)
    {<!-- -->
        while(1)
        {<!-- -->
            printf("my pid : %d , my parent :%d,id:%d\
",getpid(),getppid(),id);
            sleep(1);
        }
    }
    return 0;
}

image-20231103221733998

  • In fact, both if and else are executed in one code. This means that id is both >0 and <=0

? This was something that was not possible in our past studies.

  • According to what we have learned before, fork() returns the child process PID to the parent process and 0 to the child process

    So 249101 is the parent process and 249102 is the child process

You must have the following questions now:

  1. How can a function return twice?
  2. Why should 0 be returned to the child process and the parent process return the PID of the child process?
  3. Why does a variable have two different contents?
  4. What exactly is fork doing?

3. Why about fork

Before doing this, let us first understand this concept: when the child process is created, it will share the code and data in the memory with the parent process.

80d9fb610e3f18a4441e5584b71bb21

Here we just look at them separately. In fact, the content at the same address is used.

3.1 How does a function return twice? What exactly does fork do?

Let’s look at these two questions together.

Assume this is a pseudocode of the fork function

pid_t fork()
{<!-- -->
    Create subprocess task_struct
    Fill in the PCB corresponding content
    Have parent and child processes point to the same code
    Can be scheduled and run by the CPU
 Subprocess creation completed
        
    return ret;
}

Pay attention to the last step return ret. At this time, the child process has been created. It shares the return ret. Therefore, both the parent and child processes need to return a ret value;

We can boldly assume:

  • **ret is a variable initialized to 0. In the step of filling in the PCB corresponding to the child process, ret is assigned the value of the PID corresponding to the child process.** But at this time, ret has not been shared by the child process.

    When the child process is shared, only the unassigned ret is shared, so when returning, it returns twice because there is a return.

3.2 Why should 0 be returned to the child process and the parent process return the PID of the child process?

This is to facilitate our process management.

Like the above code, we usually want the parent process and the child process to do different things. So they need to be shunted

3.3 Why does a variable have two different contents

Here we can simply understand that copy-on-write occurs

Although the parent and child processes share the data in the code, when any process needs to modify the contents of the variable, a new data will be copied for use by the modified object.

This will be explained in detail later when we discuss the process address space.

4. Bash and child processes

After we ran the process just now, we used ps to check and found that the PPID of the parent process (the parent process of the parent process) was 80738. What is this?

image-20231103224811352

Found it to be zsh (a Bash)

image-20231103225006415

The data is used to modify the object.

This will be explained in detail later when we discuss the process address space.

Normally we enter a command on the command line, and bash is responsible for interpreting the command and starting a new child process to run the command.
image-20230905164632777