Six states of Linux kernel process management

Process concept

1) The program being executed

2) An instance of the program being executed on the computer

3) An entity that can allocate a processor and be executed by the processor

The two basic elements of a process are the program code and the data set associated with the code. Linux is a multi-user, multi-tasking system, which can run multiple programs of multiple users at the same time, so many processes will inevitably be generated, and each process will have a different state. This can introduce ‘process state’, and the process will be in various states in memory due to policy or scheduling requirements.

Process status under Linux

static const char * const task_state_array[] = {
"R (running)", /* 0 */
"S (sleeping)", /* 1 */
"D (disk sleep)", /* 2 */
"T (stopped)", /* 4 */
"t (tracing stop)", /* 8 */
"X (dead)", /* 16 */
"Z (zombie)", /* 32 */
};

(1) Linux process status: R (TASK_RUNNING), executable status:

Only processes in this state may run on the CPU. At the same time, there may be multiple processes in the executable state, and the task_struct structure (process control block) of these processes is put into the executable queue of the corresponding CPU (a process can only appear in the executable queue of one CPU at most). The task of the process scheduler is to select a process from the executable queue of each CPU to run on the CPU.

Many operating system textbooks define the process being executed on the CPU as the RUNNING state, and define the executable but not yet scheduled process as the READY state. These two states are unified as the TASK_RUNNING state under Linux.

(2) Linux process state: S (TASK_INTERRUPTIBLE), interruptible sleep state:

The process in this state is suspended because it is waiting for a certain event to occur (such as waiting for a socket connection, waiting for a semaphore). The task_struct structures of these processes are put into the waiting queue for corresponding events. When these events occur (triggered by external interrupts, or by other processes), one or more processes in the corresponding waiting queue will be woken up.

Through the ps command, we will see that under normal circumstances, most of the processes in the process list are in the TASK_INTERRUPTIBLE state (unless the load on the machine is very high). After all, there are only one or two CPUs, and there are dozens or hundreds of processes. If most of the processes are not sleeping, how can the CPU respond.

(3) Linux process state: D (TASK_UNINTERRUPTIBLE), uninterruptible sleep state:

Similar to the TASK_INTERRUPTIBLE state, the process is sleeping, but the process is not interruptible at the moment. Uninterruptible means that the CPU does not respond to external hardware interrupts, but that the process does not respond to asynchronous signals.

In most cases, a process should always be able to respond to asynchronous signals while it is sleeping. Otherwise, you will be surprised to find that kill -9 cannot kill a sleeping process! So we also understand why the process seen by the ps command almost never appears in the TASK_UNINTERRUPTIBLE state, but always in the TASK_INTERRUPTIBLE state. The significance of the existence of the TASK_UNINTERRUPTIBLE state is that certain processing processes of the kernel cannot be interrupted. If an asynchronous signal is responded to, a process for handling asynchronous signals will be inserted into the execution process of the program (this inserted process may only exist in the kernel mode, or may extend to the user mode), so the original process is interrupted up. (See “Linux Kernel Asynchronous Interrupt Analysis”) When a process operates on some hardware (for example, the process calls the read system call to read a certain device file, and the read system call finally executes the code of the corresponding device driver, and Interact with the corresponding physical device), you may need to use the TASK_UNINTERRUPTIBLE state to protect the process, so as to avoid the process of interacting with the device from being interrupted, causing the device to fall into an uncontrollable state. The TASK_UNINTERRUPTIBLE state in this case is always very short-lived, and it is basically impossible to capture it with the ps command.

There is also an easy-to-capture TASK_UNINTERRUPTIBLE state in the linux system. After executing the vfork system call, the parent process will enter the TASK_UNINTERRUPTIBLE state until the child process calls exit or exec (see “Magic vfork”).

The process in the TASK_UNINTERRUPTIBLE state can be obtained by the following code:

#include void main() {

 if (!vfork()) sleep(100);

 } 

Compile and run, and then ps: kouu@kouu-one:~/test$ ps -ax | grep a.out 4371 pts/0 D + 0:00 ./a.out 4372 pts/0 S + 0:00 ./ a.out 4374 pts/1 S + 0:00 grep a.out Then we can test the power of the TASK_UNINTERRUPTIBLE state. Regardless of kill or kill -9, the parent process in the TASK_UNINTERRUPTIBLE state is still standing.

(4) Linux process status: T (TASK_STOPPED or TASK_TRACED), paused or tracked status:

Send a SIGSTOP signal to the process, and it will enter the TASK_STOPPED state in response to the signal (unless the process itself is in the TASK_UNINTERRUPTIBLE state and does not respond to the signal). (SIGSTOP, like the SIGKILL signal, is very mandatory. The user process is not allowed to reset the corresponding signal processing function through the signal series of system calls.) Sending a SIGCONT signal to the process can restore it from the TASK_STOPPED state to the TASK_RUNNING state.

When a process is being traced, it is in the special state TASK_TRACED. “Being tracked” means that the process is paused, waiting for the process that is tracking it to do something to it. For example, if a breakpoint is placed on the tracked process in gdb, the process will be in the TASK_TRACED state when it stops at the breakpoint. At other times, the tracked process is still in the states mentioned above. For the process itself, the TASK_STOPPED and TASK_TRACED states are very similar, both indicating that the process is suspended.

The TASK_TRACED state is equivalent to an additional layer of protection above TASK_STOPPED, and the process in the TASK_TRACED state cannot be awakened in response to the SIGCONT signal. Only when the debugging process executes PTRACE_CONT, PTRACE_DETACH and other operations through the ptrace system call (the operation is specified by the parameters of the ptrace system call), or the debugging process exits, the debugged process can restore the TASK_RUNNING state.

(5) Linux process status: Z (TASK_DEAD – EXIT_ZOMBIE), exit status, the process becomes a zombie process:

The process is in the TASK_DEAD state during the process of exiting. During this exit process, all resources occupied by the process will be reclaimed, except for the task_struct structure (and a few resources). So there is only an empty shell of task_struct left in the process, so it is called a zombie. The reason why the task_struct is kept is because the exit code of the process and some statistical information are saved in the task_struct. And its parent process is likely to care about this information. For example, in the shell, the $? variable stores the exit code of the last exited foreground process, and this exit code is often used as the judgment condition of the if statement.

Of course, the kernel can also save this information elsewhere, and release the task_struct structure to save some space. But it is more convenient to use the task_struct structure, because the search relationship from pid to task_struct has been established in the kernel, as well as the parent-child relationship between processes. To release task_struct, some new data structures need to be established so that the parent process can find the exit information of its child process.

The parent process can wait for one or some child processes to exit through the wait series of system calls (such as wait4, waitid), and obtain its exit information. Then the wait series of system calls will release the body (task_struct) of the child process by the way.

During the exit process of the child process, the kernel will send a signal to its parent process to notify the parent process to “collect the corpse”. This signal is SIGCHLD by default, but this signal can be set when creating a child process through the clone system call.

A process in the EXIT_ZOMBIE state can be created by the following code:

#include void main() { if (fork()) while(1) sleep(100); } 

Compile and run, then ps: kouu@kouu-one:~/test$ ps -ax | grep a.out 10410 pts/0 S + 0:00 ./a.out 10411 pts/0 Z + 0:00 [a .out] 10413 pts/1 S + 0:00 grep a.out As long as the parent process does not exit, the child process in zombie state will always exist. So if the parent process exits, who will “collect the body” for the child process? When a process exits, all its child processes are entrusted to other processes (making them child processes of other processes). Who is it entrusted to? It may be the next process in the process group of the exiting process (if it exists), or process number 1. So every process, every moment has a parent process. Unless it’s process number 1. Process No. 1, the process with pid 1, is also called the init process.

After the Linux system starts, the first user mode process created is the init process. It has two missions:

  • 1. Execute the system initialization script and create a series of processes (they are all descendants of the init process);

  • 2. Wait for the exit event of its child process in an infinite loop, and call the waitid system call to complete the “corpse collection” work;

The init process will not be suspended, nor will it be killed (this is guaranteed by the kernel). It is in the TASK_INTERRUPTIBLE state while waiting for the child process to exit, and it is in the TASK_RUNNING state during the “corpse collection” process.

(6) Linux process status: X (TASK_DEAD – EXIT_DEAD), exit status, the process is about to be destroyed:

And the process may not retain its task_struct during exit. For example, this process is a process that has been detach in a multi-threaded program (process? thread? see “Analysis of Linux Threads”). Or the parent process explicitly ignores the SIGCHLD signal by setting the handler of the SIGCHLD signal to SIG_IGN. (This is a POSIX specification, although the exit signal of the subprocess can be set to something other than SIGCHLD.)

At this point, the process will be placed in the EXIT_DEAD exit state, which means that the following code will immediately and completely release the process. So the EXIT_DEAD state is very short-lived, and it is almost impossible to capture it with the ps command.

The initial state of the process

Processes are created through the fork series of system calls (fork, clone, vfork), and the kernel (or kernel module) can also create kernel processes through the kernel_thread function. These functions for creating subprocesses essentially perform the same function-copy a copy of the calling process to obtain a subprocess. (Option parameters can be used to determine whether various resources are shared or private.)

So since the calling process is in the TASK_RUNNING state (otherwise, if it is not running, how can it make a call?), the child process is also in the TASK_RUNNING state by default.

In addition, the system call clone and the kernel function kernel_thread also accept the CLONE_STOPPED option, thereby setting the initial state of the child process to TASK_STOPPED.

process state transition

After the process is created, the state may undergo a series of changes until the process exits. Although there are several process states, there are only two directions for process state transitions-from TASK_RUNNING state to non-TASK_RUNNING state, or from non-TASK_RUNNING state to TASK_RUNNING state.

That is to say, if a SIGKILL signal is sent to a process in the TASK_INTERRUPTIBLE state, the process will first be awakened (enter the TASK_RUNNING state), and then exit in response to the SIGKILL signal (change to the TASK_DEAD state). It does not exit directly from the TASK_INTERRUPTIBLE state.

The process changes from non-TASK_RUNNING state to TASK_RUNNING state, which is realized by other processes (or interrupt handlers) performing wake-up operations. The process that executes the wake-up sets the state of the wake-up process to TASK_RUNNING, and then adds its task_struct structure to the executable queue of a certain CPU. Then the awakened process will have the opportunity to be scheduled for execution.

There are two ways for the process to change from the TASK_RUNNING state to the non-TASK_RUNNING state:

  • 1. Enter the TASK_STOPED state or TASK_DEAD state in response to the signal;

  • 2. Actively enter the TASK_INTERRUPTIBLE state (such as the nanosleep system call) or TASK_DEAD state (such as the exit system call) when executing the system call; or enter the TASK_INTERRUPTIBLE state or TASK_UNINTERRUPTIBLE state (such as the select system call) because the resources required for executing the system call cannot be met. transfer).

Obviously, both of these cases can only happen if the process is executing on the CPU.

Status switching

The process constantly changes its running state while it is running;

1) Ready state

When the process has been allocated all necessary resources except the CPU, it can be executed immediately as long as the processor is obtained, and the process state at this time is called the ready state.

2) Execution (Running) status

When the process has obtained the processor and its program is being executed on the processor, the state of the process at this time is called the execution state

3) Blocked (Blocked) state

When a process that is executing cannot execute because it is waiting for an event to occur, it gives up the processor and is in a blocked state. There can be many kinds of events that cause the process to block, for example, waiting for I/O to complete, applying for buffers that cannot be satisfied, waiting for letters (signals), and so on.

Ready –> Execute

A process in the ready state becomes the execution state when the scheduler assigns a processor to it.

Execution –> Ready

The process in the execution state has to give up the processor after the time slice runs out during its execution, so it changes from execution to ready state.

Execution –> Blocking

When an executing process waits for some event and cannot continue executing, it changes from the executing state to the blocked state.

Blocking –> Ready

A process in a blocked state transitions from a blocked state to a ready state if the waiting time occurs.

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge Cloud native entry skill tree k8s package management (helm)install helm11122 people are learning the system