Linux creates multiple pipelines to control problems in multiple processes through multiple pipelines

Introduction

Recently, I learned that Linux uses anonymous pipes for inter-process communication, and I am going to write some code to consolidate this part of knowledge, and at the same time consolidate the content of file descriptors and process control. The general framework is to let the process create a batch of pipes and sub-processes, and use this batch of pipes to control the sub-processes. The parent process writes content to the pipe and issues tasks, and the child process reads the content of the pipe to execute the tasks issued by the parent process. In the code, I use a for loop to create 5 anonymous pipes, corresponding to 5 subprocesses. When the child process is finally recycled, the read-write end of the file descriptor corresponding to the pipe in the child process is closed (the parent process closes the write end of the pipe, and the child process jumps out of the execution task when it reads 0, and then the child process closes the read-write end of the corresponding pipe. terminal), and then perform resource recovery through the parent process waitpid(). Since there are multiple child processes, a loop is continued to recycle all child processes. There is a bug in resource recycling, and related resources cannot be recycled (to simplify the code, all the following codes do not recycle sub-processes, but only reflect the code that creates an ideal multi-pipeline control multi-process model). As a novice, record this question as a blog to facilitate subsequent knowledge consolidation. There may be mistakes in the text, please feel free to correct me.

Analysis of the cause of the problem

My purpose is to create multiple pipelines corresponding to multiple processes to control the corresponding processes. The write-end file descriptors of all pipes are stored in the file descriptor table of the parent process, and the read-end file descriptions of the corresponding pipes are stored in the file descriptor table of each child process. Its structural diagram is shown in the figure below (only the file descriptors related to the pipeline are listed here, and the file descriptors corresponding to standard input, output, and errors are not listed):

Figure 1. Ideal model for multiple pipelines controlling multiple processes

#include <iostream>
#include <unistd.h>
#include <cassert>
#include <stdio.h>
#include <stdlib.h>

int main()
{
    //Let the parent process write and the child process read
int i = 0;
    for(i = 1; i <= 5; i ++ )
    {
        int fd[2] = {0};
        // create pipeline
        pipe(fd);
        //Create child process
        pid_t id = fork();
        assert(id >= 0);
        if(id == 0)
        {
        //child process
            //Close unneeded pipe read and write ends
            close(fd[1]);
            printf("I am the child process of %d, pid:%d, ppid:%d, I have been created successfully!\
", i, getpid(), getppid());
            while(1)
            {
                // Execute the target task
            }
            close(fd[0]);
            exit(0);
        }
        // parent process
            //Close unneeded pipe read and write ends
        close(fd[0]);
        sleep(1);
    }
    while(1)
    {
        //Issue task
        std::cout << "I am the parent process, pid:" << getpid() << ", all pipelines are created, and the task is started..." << std::endl;
        sleep(10);
    }
    // recycle child process
return 0;
}

Figure 2. ctrl_process related processes

Figure 3. Running results

It can be seen from Figure 2 and Figure 3 that the for loop has successfully created the corresponding pipeline and process.

Figure 4. The file descriptor table corresponding to each process

But after I printed out the file descriptor table of each process as shown in Figure 4 (the first one is the file descriptor table of the parent process, and the next one is the file descriptor table of the child process 1-5) I came to a conclusion . I overlooked a very important issue. When creating a child process, the child process will inherit the file descriptor table of the parent process, so after the child process is created for the first time and the pipeline communication is established, when the child process is created by the second for loop, the child process inherits the parent process The file descriptor table of the process, so the file descriptor table of child process 2 will have a file descriptor pointing to the write end of pipe 1. According to this, the later the child process is created, the corresponding file descriptor table will have more There are many file descriptors, and these file descriptors point to the write end of the previously created pipeline, so the actual model of the multi-pipe control multi-process we created through the for loop is shown in Figure 5 below:

Figure 5. The actual model of multi-pipeline control multi-process created by for loop

It can be obtained from the code that each time the pipeline is created in the parent process first, and then the child process is created after the pipeline is created. When entering the child process, first close the corresponding pipeline write end, and then close the corresponding pipeline read end in the parent process. According to the distribution rules of the file descriptor, the parent process first creates the pipeline, so the file descriptors 3 and 4 corresponding to the parent process correspond to the read and write ends of the pipeline 1, and then create the child process 1. After entering the child process 1, the child process 1 inherits The file descriptor table of the parent process, so the file descriptors 3 and 4 of the child process 1 correspond to the read and write ends of the pipe 1. Then sub-process 1 closes the write end of pipe 1, so the No. 4 file descriptor of sub-process 1 is vacated. The end of pipe 1 is closed in the parent process, so the file descriptor No. 3 of the parent process is empty. According to the allocation rules of file descriptors from small to large free file descriptors, when creating pipeline 2, the read end of pipeline 2 is assigned to the No. 3 file descriptor of the parent process, and the write end is assigned to No. 5 file descriptor of the parent process. Then create child process 2, child process 2 inherits the file descriptor table of the parent process, and then child process 2 closes the write end of pipeline 2, so the No. 5 file descriptor of the child process is empty. The read end of pipe 2 is closed in the parent process, so the No. 3 file descriptor of the parent process is empty. By analogy, the structure of the multi-pipe control multi-process actually created by the code in Figure 5 can be obtained.

Therefore, we are using the parent process to close the corresponding pipe write end, the child process judges whether the read pipe content returns 0, returns 0 and jumps out of the loop, and the method of ending the child process cannot normally recycle the child process, because there are multiple process file descriptors Pointing to the write end of the same pipe, all file descriptors pointing to the corresponding pipe write end must be closed before the write end of the pipe can be truly closed and resources can be recovered.

So how do we solve this problem? Based on the model in Figure 5 we actually created, we can recycle resources backwards, but this method is a bit tricky, and our recycling of resources is not the ultimate goal. Our ultimate goal is to create an ideal model like Figure 1.

In this case, how do we create an ideal model like that in Figure 1? We can find that all processes, including child processes, point to the same file descriptor at the write end of the same pipe. Then, we can manage the pipe write-end file descriptor pointed to by the parent process in a vector. After the ship child process, at the beginning of the child process, all the files inherited from the parent process pointing to the pipe write end The descriptor is closed. In this way, an ideal model like Figure 1 can be created. Let’s look at the code and the running results.

#include <iostream>
#include <unistd.h>
#include <cassert>
#include <stdio.h>
#include <stdlib.h>
#include <vector>

using namespace std;

int main()
{
    std::vector<int> fds;
    //Let the parent process write and the child process read
int i = 0;
    
    for(i = 1; i <= 5; i ++ )
    {
        int pipefd[2] = {0};
        // create pipeline
        pipe(pipefd);
        //Create child process
        pid_t id = fork();
        assert(id >= 0);
        if(id == 0)
        {
        //child process
            //Close unneeded pipe read and write ends
            for(auto fd : fds) close(fd);
            close(pipefd[1]);
            printf("I am the child process of %d, pid:%d, ppid:%d, I have been created successfully!\
", i, getpid(), getppid());
            while(1)
            {
                // Execute the target task
            }
            close(pipefd[0]);
            exit(0);
        }
        // parent process
            //Close unneeded pipe read and write ends
        close(pipefd[0]);
        fds.push_back(pipefd[1]);// Put the file descriptor of the parent process pointing to the write end of the pipe into fds for management
        sleep(1);
    }
    while(1)
    {
        //Issue task
        std::cout << "I am the parent process, pid:" << getpid() << ", all pipelines are created, and the task is started..." << std::endl;
        sleep(10);
    }
    // recycle child process
return 0;
}

Figure 6. ctrl_process related processes (after improving the code)

Figure 7. The file descriptor table corresponding to each process (after improving the code)

It can be seen from the above figure that the ideal model of the ideal multi-pipe control multi-process is established. It is worth mentioning that the above method confuses us. When the child process inherits the file descriptor table of the parent process, it must not only point to the same file, but also have exactly the same file descriptor. Then if the child process has other open files, according to the allocation rules of file descriptors, the free file descriptors are allocated from small to large, then if the child process has other open files, the value of the inherited file description is not the same Maybe it will be different. Let’s start with creating a child process. When creating a child process, the kernel will create the PCB of the child process. This PCB contains various attributes of the process, some of which are unique to the child process, such as PID, etc. , and most other attributes are inherited from the parent process. That is to say, the kernel starts to maintain the PCB of the child process from the creation of the child process, and the PCB contains the file descriptor table corresponding to the process, that is, the child process inherits the file descriptor table of the parent process immediately after it is created. At this time It is impossible for other files in the child process to be opened before this, and the other files opened must be opened after the child process inherits the file descriptor table of the parent process. According to the allocation rules of file descriptors, the file descriptors pointing to the same file in the file descriptor table inherited by the child process from the parent process must be the same.

Conclusion

The value of the file descriptor pointing to the same file in the file descriptor table inherited by the child process from the parent process must be the same, and these file descriptors inherited from the parent process pointing to the same file must be in front of the file descriptor table part. After the child process is created, the file descriptors assigned to the parent and child processes to open the same file are not necessarily the same. This is because the file descriptor tables of the parent and child processes are independent at this time. During this period, the parent and child processes may open Or close the file in it. According to the allocation rules of file descriptors, the file descriptors of the same file opened by the parent and child processes may be different, but they all point to the same file.

The allocation rule of the file descriptor is allocated from the number of free positions in the file descriptor table from small to large. In the two processes, the file descriptors pointing to the same open file may be different, and the file descriptors pointing to the same file may also be the same, which should be analyzed according to the specific situation.

The file descriptor table is an attribute of the process, not data, so the child process inherits the file descriptor table from the parent process by immediately copying the file descriptor table from the parent process to the child process, rather than copying on write.

The file descriptor is essentially the subscript of the array. Under normal circumstances, the process will open the standard input, output, and error corresponding to the file descriptors 0, 1, and 2 by default.

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge CS introductory skill tree Linux practical commandsPipeline 34236 people are studying systematically