20 | Daimyo’s choice: see how I can perceive multiple I/O events at the same time

This lecture is the first lecture in the performance chapter. In the performance chapter, we will focus on how to design high-concurrency and high-performance network server programs. I hope that through the study of this module, you can master the knowledge of multiplexing, asynchronous I/O, multi-threading, etc., so that you can write a high-performance network server program that supports more than 10K concurrency.

What is I/O multiplexing

In Lecture 11, we designed an application that receives data input from standard input and sends it out through a socket. At the same time, the program also receives a stream of data sent by the other party through the socket.

We can use the fgets method to wait for standard input, but once we do this, there is no way to read the data when the socket has data; we can also use the read method to wait for the socket to have data to return, but there is no way to do so. When there is data in the standard input, read the data and send it to the other party.

The original design intention of I/O multiplexing is to solve such scenarios. We can regard standard input, sockets, etc. as all channels of I/O. Multiplexing means that when an “event” occurs in any channel of I/O, the application is notified to process the corresponding I/O. /O events, so that our program becomes a “generalist” and can handle multiple I/O events at the same time.

Like the example just now, after using I/O multiplexing, if there is data in the standard input, the data can be read from the standard input immediately and sent out through the socket; if the socket has data to read, the data can be read out immediately.

The select function is such a common I/O multiplexing technology. We will continue to explain other multiplexing technologies later. Use the select function to notify the kernel to suspend the process. When one or more I/O events occur, control is returned to the application, and the application processes the I/O events.

There are many types of these I/O events, such as:

The standard input file descriptor is ready for reading.

The listening socket is ready and the new connection has been successfully established.

The connected socket is ready for writing.

If you wait for more than 10 seconds for an I/O event, a timeout event occurs.

How to use the select function

The use of the select function is a bit complicated. Let’s take a look at its declaration first:

int select(int maxfd, fd_set *readset, fd_set *writeset, fd_set *exceptset, const struct timeval *timeout);

Return: If there is a ready descriptor, its number, if it times out, it is 0, if there is an error, it is -1

In this function, maxfd represents the descriptor base to be tested, and its value is the largest descriptor to be tested plus 1. For example, if the current select descriptor set to be tested is {0,1,4}, then maxfd is 5. Why is it 5 instead of 4? I will explain it below.

Next are three descriptor sets, namely read descriptor set readset, write descriptor set writeset and exception descriptor set exceptset. These three respectively inform the kernel on which descriptors the data can be read, written and Something abnormal has occurred.

So how do you set up these descriptor sets? The following macros can help us.

void FD_ZERO(fd_set *fdset);
void FD_SET(int fd, fd_set *fdset);
void FD_CLR(int fd, fd_set *fdset);
int FD_ISSET(int fd, fd_set *fdset);

If you’re just getting started, understanding these macros may be a little difficult. It doesn’t matter, we can imagine that the following vector represents a descriptor set, where each element of this vector is 0 or 1 in a binary number.

a[maxfd-1], ..., a[1], a[0]

We understand these macros in this way:

FD_ZERO is used to set all elements of this vector to 0;

FD_SET is used to set the element corresponding to socket fd, a[fd] to 1;

FD_CLR is used to set the element corresponding to socket fd, a[fd] to 0;

FD_ISSET checks this vector to determine whether the element a[fd] of the corresponding socket is 0 or 1.

Among them, 0 means no processing is required, and 1 means processing is required.

In fact, many systems use an integer array to represent a descriptor set. A 32-bit integer can represent 32 descriptors. For example, the first integer represents 0-31 descriptors, and the second Integers can represent 32-63 descriptors, and so on.

At this time, it is more convenient to understand why the corresponding maxfd of the descriptor set {0,1,4} is 5 instead of 4.

Because this vector corresponds to the following:

a[4],a[3],a[2],a[1],a[0]

The number of descriptors to be tested is obviously 5, not 4.

Each of the three descriptor sets can be set to empty, which means that no relevant detection is required by the kernel.

The last parameter is the timeval structure time:

struct timeval {
  long tv_sec; /* seconds */
  long tv_usec; /* microseconds */
};

Setting this parameter to different values will have different possibilities:

The first one may be set to NULL, which means that select will wait forever if no I/O event occurs.

The second possibility is to set a non-zero value, which means waiting for a fixed period of time before returning from the select blocking call. This was used in the timeout example in Chapter 12.

The third possibility is to set both tv_sec and tv_usec to 0, which means not waiting at all and returning immediately after the detection is completed. This situation is used less frequently.

Program example

The following is a specific program example through which we can understand the select function.

int main(int argc, char **argv) {
    if (argc != 2) {
        error(1, 0, "usage: select01 <IPaddress>");
    }
    int socket_fd = tcp_client(argv[1], SERV_PORT);

    char recv_line[MAXLINE], send_line[MAXLINE];
    int n;

    fd_set readmask;
    fd_set allreads;
    FD_ZERO( & amp;allreads);
    FD_SET(0, &allreads);
    FD_SET(socket_fd, & amp;allreads);

    for (;;) {
        readmask = allreads;
        int rc = select(socket_fd + 1, & amp;readmask, NULL, NULL, NULL);

        if (rc <= 0) {
            error(1, errno, "select failed");
        }

        if (FD_ISSET(socket_fd, & amp;readmask)) {
            n = read(socket_fd, recv_line, MAXLINE);
            if (n < 0) {
                error(1, errno, "read error");
            } else if (n == 0) {
                error(1, 0, "server terminated \\
");
            }
            recv_line[n] = 0;
            fputs(recv_line, stdout);
            fputs("\\
", stdout);
        }

        if (FD_ISSET(STDIN_FILENO, & amp;readmask)) {
            if (fgets(send_line, MAXLINE, stdin) != NULL) {
                int i = strlen(send_line);
                if (send_line[i - 1] == '\\
') {
                    send_line[i - 1] = 0;
                }

                printf("now sending %s\\
", send_line);
                size_t rt = write(socket_fd, send_line, strlen(send_line));
                if (rt < 0) {
                    error(1, errno, "write failed ");
                }
                printf("send bytes: %zu \\
", rt);
            }
        }
    }

}

Line 12 of the program initializes a descriptor set through FD_ZERO, and the descriptor read set is empty:

In lines 13 and 14 of the next program, FD_SET is used to set descriptor 0, the standard input, and connection socket descriptor 3 to be detected:

The next lines 16-51 are loop detection. Here we do not block in fgets or read calls, but use select to detect that the socket descriptor has data to read, or that the standard input has data to read. For example, when the user makes the standard input descriptor readable through standard input, the value of readmask returned is:

At this time, the select call returns, and you can use FD_ISSET to determine which descriptor is ready to be read. As shown in the figure above, the standard input is readable at this time, and lines 37-51 of the program are read and sent to the peer.

If the connection descriptor is ready to be read, line 24 is judged to be true and read is used to read the socket data.

What we need to note is that lines 17-18 of this program are very important, and beginners can easily fall into the trap here.

Line 17 is to reset the descriptor set to be tested after each test. You can see in the above example that the data before the select test is {0,3}, and after the select test it becomes {0}.

This is because after each select call completes the test, the kernel will modify the descriptor set and interact with the application through the modified descriptor set. The application uses FD_ISSET to judge each descriptor to know what kind of event it is. occur.

Line 18 uses socket_fd + 1 to represent the descriptor base to be tested. Remember to require +1.

Socket descriptor ready condition

When we say that the select test returns and a certain socket is ready for reading, what kind of event does it mean that something happened?

The first situation is that there is data in the socket receiving buffer that can be read. If we use the read function to perform a read operation, it will definitely not be blocked, but will read this part of the data directly.

The second case is that the other party sends a FIN and uses the read function to perform the read operation. It will not be blocked and returns 0 directly.

The third situation is for a listening socket. There is a completed connection establishment. At this time, using the accept function to execute will not block and directly return the completed connection.

The fourth case is that the socket has an error to be handled. Use the read function to perform the read operation without blocking and return -1.

To sum it up in one sentence, the kernel notifies us that the socket has data to read, and using the read function will not block.

I don’t know if you are like me. When you first understand that a socket is writable, you will have an illusion. You always understand that the socket is writable from the perspective of the application. This is what I thought at first. The application has completed the corresponding calculations and has data ready to be sent to the peer. It can be written to the socket, which means the socket is writable.

In fact, this understanding is very incorrect. Select detects that the socket is writable, which is completely based on the characteristics of the socket itself. Specifically, there are the following situations.

The first is that the socket send buffer is large enough. If we use a blocking socket for write operation, it will not be blocked and return directly.

The second is that the write half of the connection has been closed. If the write operation continues, a SIGPIPE signal will be generated.

The third method is that there is an error to be handled on the socket. Use the write function to perform the write operation without blocking and return -1.

To sum it up in one sentence, the kernel notifies us that the socket can be written to, and using the write function will not block it.

Summary

Today we talked about the use of select function. The select function provides the most basic I/O multiplexing method. When using select, we need to establish two important understandings:

The descriptor base is the current largest descriptor + 1;

After each select call is completed, remember to reset the collection to be tested.