Operating system process 2—process members and fork

Last time we learned what a process is and how processes are managed in the operating system. Today let’s get to know the members of pcb

In Linux, we can use the ajx option in the ps command to output all processes in the current system.

Let’s start with pid and ppid.

Article directory

  • 1.Pid and ppid in the process
  • 2. Simple understanding of parent process and child process
  • 3. System call function
    • 1). getpid, getppid
    • 2). fork
      • a. fork simple understanding
      • b. Usage of fork
      • c. Principle of fork

1. pid and ppid in the process

There are two members in the process pcb, one is pid and the other is ppid. pid refers to the id of the process in the operating system, and ppid refers to the id of its parent process.
It’s roughly as follows:

struct pcb{<!-- -->
pid_t pid;
pid_t ppid;
//Various attributes
};

This pit_t is an integer type.

2. Simple understanding of parent process and child process

For parent-child processes in the operating system, first of all, the child process will have ppid to record its parent process id. Moreover, the child process will assign some member variables of the parent process to itself.
When we compile an executable program and execute it, we will see it in the process.

Let’s run this program multiple times.

We found that the pid of this program has been changing, but the pid has not changed, indicating that its parent process has always been the same. We now find this process.

Found that it is -bash, which is our command line interpreter. Processes run from the command line are all subprocesses of bash
So besides the ps command, how else can we view the processes in the system?
There is a proc folder in the Linux system, which stores all process folders. Various data of the process are recorded in the folder. And because the process is constantly changing, the contents of the proc folder are also constantly changing.

We run our own program and see what this file says about our program.

Here are a few things we need to know

One is exe, which allows the process to know where the executable program is stored (in the disk)
One is cwd, which records the current directory of the program.
Current directory: When we wrote C language for file io, we had the concept of absolute path and current working directory, so the current working directory is the place where files are created by default when we do not fill in the absolute path. This is usually in the same directory as your own executable program.
There is also a command, chdir, which can modify the current working directory

3. System call function

The Linux system provides three system call functions, getpid, getppid and fork.

1). getpid, getppid


He said that this function call will not fail and gave the corresponding header file. Now let’s try these two functions.

int main()
{<!-- -->
  pid_t pid = getpid();
  pid_t ppid = getppid();
  printf("I am a process, my id is: %d, my parent process id is: %d\
", pid, ppid);
  return 0;
}

2). fork

a. Simple understanding of fork

Creating a process in Linux can be created from the command line or through code. This is the third system call interface to know, fork.

He will create a child process of the process formed by this executable program after it is executed. And the pcb of this child process contains some attributes that are directly copied from the parent process.
He will return two values. Returns the ID of the child process to the parent process, and returns to the child process 0. If the process creation fails, a number less than 0 will be returned.
Let’s demonstrate it:

int main()
{<!-- -->
  printf("I am a process and my id is: %d\
", getpid());
  fork();
  printf("hello world\
");
  return 0;
}


We will see a phenomenon. The first sentence of the program is executed once, and the third sentence is executed twice. This is an unprecedented situation.
Let’s modify the code slightly:

int main()
{<!-- -->
  printf("I am a process and my id is: %d\
", getpid());
  fork();
  printf("hello world, my id is: %d, my parent id is: %d\
", getpid(), getppid());
  return 0;
}

This 3078 process is bash. We also know that fork will create a new process, so now we know that this new process is a child process of the current program. The child process created after the fork function is executed will execute the subsequent program together with the parent process (this is implemented by the eip of the register in the cpu, which is used to record the address of the code to be executed), while before the fork it was only Parent process execution
We found that fork also has a return value. Let us slightly modify the above code to see this return value.

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>

int main()
{<!-- -->
  printf("I am a process and my id is: %d\
", getpid());
  pid_t id = fork();
    printf("hello world, my id is: %d, my parent id is: %d, my return value is: %d\
", getpid(), getppid(), id);
  return 0;
}


We found that the same variable can produce different results, so it is obvious that fork actually has two return values. returns the pid of the child process to the parent process, and returns 0 to the child process. This is obviously It is inconsistent with our previous understanding. Also, if fork fails, it will return a number less than 0. This is why we will talk about it later. Let’s talk about it first. What can its return value do?

b. Usage of fork

We know that fork can create a process using code, and it is a child process of its current process. It has different return values for the two processes. The purpose of creating a sub-process must be because a single process cannot complete the task, so we need two processes to complete the tasks we need. For example, when downloading and playing at the same time, how to ensure that each process is doing its own thing correctly? Don’t forget that we have two return values. Let’s demonstrate it with code:

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>

int main()
{<!-- -->
  printf("I am a process, my id is: %d, my parent process is: %d\
", getpid(), getppid());


  pid_t id = fork();
  if(id < 0)
    return -1;
  else if(id == 0)
  {<!-- -->
    while(1)
    {<!-- -->
      printf("I am a child process. My ID is: %d. My parent process ID is: %d. I am performing a download task\
", getpid(), getppid());
      sleep(1);
    }
  }
  else
  {<!-- -->
    while(1)
    {<!-- -->
      printf("I am a parent process. My ID is: %d. My parent process ID is: %d. I am performing a playback task\
", getpid(), getppid());
      sleep(2);
    }
  }
  return 0;
}

c. Principle of fork

After we got to know fork, we must have a very big doubt, that is, a function has two return values. Next, let’s explain the principle behind fork in detail.

What did fork do?

We know that when a program is run, the program is first loaded into the memory and then the operating system generates the corresponding pcb structure and then links the structure into the run queue. If there is fork in the code, then it will create another process. That means that the operating system will generate another pcb structure and then chain it into the run queue. Then there will be two pcbs pointing to the same code segment. We also know that process = code + process pcb, and the code contains the code segment and the data generated when running the code, so it is better to explain it in detail One point: Process = code segment + code data + process pcb. For two processes, their code may be the same, but the code data they generate is not necessarily the same. So when we use fork to create a child process, the generated code data must be one copy of the parent process and one copy of the child process, and here the operating system The method used is copy-on-write. What exactly is copy-on-write? If you are interested, you can read my other blog: Using the idea of copy-on-write.

Then why does fork return the pid of the child process to the parent process?

This is because the parent-child relationship is a one-to-many relationship. The parent process needs to know the existence of each child process, and the child process only needs to know about itself and the parent process.

Why does fork return two values?

Now let’s study an ordinary function: What does it mean when a function returns? It means that this function has completed the task it should do, and then only needs to return the data it generated. The same is true for the fork system call interface. It is also a function. When it executes return, it means that its task of creating a child process has been completed. After fork, the code is shared. That means that starting from return, the parent and child processes start executing code together. This is why return has different return values.

Why does a variable store two different values?

Some people will definitely say that because the code is executed twice, the variable names are the same. After execution, when the code uses the variable that receives the return value, how does the operating system know who it is using to return? Then let’s take a look at the address of this variable:



His address is actually the same! ! !

We will discuss this issue later.