Multiplexing – select, poll and epoll and Reactor mode

In the Linux system, multiplexing (Multiplexing) is an important event processing mechanism, which allows a process or thread to monitor multiple file descriptors (sockets, files, etc.) at the same time, and then extract multiple ready Event file descriptors are processed one by one. Combining multi-process, multi-thread and multi-path transfer is a common method to realize high-concurrency servers.

1. select

select() is a system call used to select a file descriptor among a set of file descriptors that is ready for some kind of I/O operation. The prototype of the select() function is as follows:

#include <sys/select.h>
int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

nfds: The number of file descriptors to be tested, that is, the range of all file descriptors in the fd_set collection, that is, the maximum value of all file descriptors plus 1.
readfds: A collection of readable file descriptors.
writefds: A collection of writable file descriptors.
exceptfds: collection of exception file descriptors.
timeout: timeout time.

return value:

-1: error, the cause of the error is stored in errno.
0: Timeout expired
Positive integer: number of ready file descriptors.

Among them, readfds, writefds, and exceptfds are all pointers to the fd_set structure. They are switches. For example, if the value of fd_set pointed to by readfds is…00000001, it means that selsect monitors the readable event of file descriptor 0. (bit 0 of fd_set is 0), and does not care about the readable events of file descriptors 2, 3, 4… (the remaining bits of fd_set are all 0). fd_set is a structure by default, and inside is an array, as the carrier of the bitmap, it occupies 128 bytes, 1024 bits, and each bit represents a file descriptor, which can manipulate file descriptors from 0 to 1024.

You need to use the specified macro to operate the fd_set structure:

FD_ZERO(fd_set *set): Clear the set to zero.
FD_SET(int fd, fd_set *set): Add fd to set.
FD_CLR(int fd, fd_set *set): Delete fd from set.
FD_ISSET(int fd, fd_set *set): Test whether fd is in set.

The timeout parameter is very important, and the structure it points to is as follows:

struct timeval {
    __time_t tv_sec; /* Seconds. */
    __suseconds_t tv_usec; /* Microseconds. */
};

Among them, tv_sec represents the number of seconds, and tv_usec is the number of microseconds, that is, the fraction after the second.

When the timeout value is greater than 0 (either of the two values in the structure is greater than 0), select will wait for the timeout time, and if any monitored file has a ready event, the function returns immediately, and the return value is the number of ready files, and timeout is at the same time An output parameter, which stores the remaining time inside; readfds, writefds, and exceptfds are also output parameters. At this time, the internal nth bit is 1, which means that the file descriptor n has a ready event. If there is no ready event for the file after timeout, the function returns 0.

If the value of the timeout parameter is NULL, it means that select enters the blocking mode, and does not return until there is a ready event in at least one monitored file, otherwise it will be blocked.

advantage:

You can wait for multiple fds at a time, and to a certain extent, improve the efficiency of I0.
shortcoming:
1. Since the function parameters are input and output parameters, each time the function returns, it must be reset when it is called again; after each completion, it is also necessary to traverse and detect the values of readfds, writefds, and exceptfds.

2. fd_set, which allows select to detect fd at the same time has an upper limit, the default is 0-1024.

3. The underlying principle of select is to know which fds are ready through polling detection, and the efficiency is limited.

2. poll

In the Linux system, in addition to select, there is another multiplexing mechanism called poll. Similar to select, poll also allows a process or thread to monitor multiple file descriptors and process them when there are ready events. Compared with select, poll has some advantages, but also has some limitations. Compared with select, there is no particularly big improvement, and there are fewer sh.

Function prototype:

#include <poll.h>
int poll(struct pollfd *fds, nfds_t nfds, int timeout);

struct pollfd: It is a structure array, which is used to transmit the file descriptor information and ready event to be monitored.
nfds: The number of file descriptors to be tested, that is, the number of elements in the structure array.
timeout: timeout time, similar to select timeout, the unit is milliseconds.
Return value: Same as select, returns the number of prepared file descriptors, 0 means timeout, -1 means error, and the cause of the error is stored in errno.

Structure struct pollfd:

struct pollfd {
    int fd; // file descriptor to be monitored
    short events; // expected event type, composed of the following macros
    short revents; // The type of event that actually occurred, filled by the kernel
};

Event type macro:

POLLIN: Indicates that the file descriptor can be read (including new data in the TCP connection and data in the socket receiving buffer).
POLLOUT: Indicates that the file descriptor can be written.
POLLERR: Indicates that an error occurred on the file descriptor.
POLLHUP: Indicates that the file descriptor is hung up (connection closed).
POLLNVAL: Indicates that the file descriptor is not open.

Instructions:

Unlike select, poll does not need to operate on large bitmaps, but uses a structure array, and each array element corresponds to a file descriptor. Then the user knows from the revents of the pollfd array elements whether the fd stored in the pollfd has the monitored ready event type triggered.

#include <stdio.h>
#include <poll.h>

int main() {
    struct pollfd fds[2];
    int timeout = 5000; // 5 seconds timeout

    // set the first file descriptor
    fds[0].fd = fd1;
    fds[0].events = POLLIN; // only focus on readable events

    // set the second file descriptor
    fds[1].fd = fd2;
    fds[1].events = POLLIN | POLLOUT; // Focus on both readable and writable events

    int ret = poll(fds, 2, timeout);
    if (ret == -1) {
        perror("poll");
        return 1;
    } else if (ret == 0) {
        printf("Timeout\
");
    } else {
        // check for ready event
        for (int i = 0; i < 2; i ++ ) {
            if (fds[i]. revents & POLLIN) {
                printf("fd%d is ready for reading\
", i + 1);
            }
            if (fds[i]. revents & POLLOUT) {
                printf("fd%d is ready for writing\
", i + 1);
            }
            if (fds[i]. revents & amp; (POLLERR | POLLHUP | POLLNVAL)) {
                printf("Error on fd%d\
", i + 1);
            }
        }
    }

    return 0;
}

advantage:

No need to use fd_set, avoiding the limit of 1024 file descriptors in select.
There is no need to reset the monitored file descriptor every time it is called, and it is more flexible to use a structure array.

shortcoming:

Like select, it is also polling detection, and its efficiency is limited. When the number of file descriptors is large, performance may not be as good as other more advanced mechanisms such as epoll.

3. epoll

In the Linux system, epoll is an efficient multiplexing mechanism for monitoring multiple file descriptors and processing them when there are ready events. Compared with select and poll, epoll has better performance when dealing with the concurrency of a large number of file descriptors.

The underlying principle of epoll is much more complicated than select and poll. First, it has three interfaces:

#include <sys/epoll.h>
int epoll_create(int size);
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

epoll_create: Create an epoll instance and return the epoll file descriptor (epfd).
epoll_ctl: Control the behavior of epoll, such as adding, modifying, and deleting monitored file descriptors.
epoll_wait: Wait for the ready event and return the ready file descriptor.

(1) epoll_create

The function of the epoll_creat function is to apply for a new file descriptor (that is, the return value), and a red-black tree is created inside it to store the file descriptors that epoll needs to monitor and the ready time types that need to be monitored (readable, writable) , exception, etc.).

Each node of the red-black tree is an epoll_event:

struct epoll_event {
    __uint32_t events; // Store the type of event that occurred, which can be a bitmask of multiple events
    epoll_data_t data; // User data, which can be pointers, file descriptors, etc.
};

typedef union epoll_data {
    void *ptr;
    int fd;
    __uint32_t u32;
    __uint64_t u64;
} epoll_data_t;

Among them, event stores the type of ready event that needs to be monitored. Data is a common body. Users can decide what information to store, such as storing the corresponding file descriptor, or a structure pointer. The internal definition of the structure is richer. Information. Each node in the red-black tree represents the file and ready event type to be monitored by the kernel.

(2) epoll_ctl

The function of the epoll_ctl function is actually to add, delete or modify nodes inside the red-black tree created by epoll_create.

#include <sys/epoll.h>
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);

epfd is the epoll file descriptor, created by epoll_create.
op is the operation type and has three values:
- EPOLL_CTL_ADD: Add the file descriptor fd to the epoll instance.
- EPOLL_CTL_MOD: Modify the monitoring events of the file descriptor fd in the epoll instance.
- EPOLL_CTL_DEL: Delete the file descriptor fd from the epoll instance.
fd is the file descriptor to be operated, which can be socket, file, etc.
event is a pointer to the struct epoll_event structure, which is used to specify the event to be added or modified. The events field in the struct epoll_event structure is used to specify the event type of interest, and the data field is used to store user data, which can be pointers, file descriptors etc.
Function return value: success returns 0; failure returns -1, and the cause of failure is stored in errno.

(3) epoll_wait

Of course, epoll_wait is the interface used to obtain from the kernel which monitored files have generated ready events.

#include <sys/epoll.h>
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

epfd is the epoll file descriptor, created by epoll_create.
events is a pointer to the struct epoll_event structure array, which is used to store ready events.
maxevents is the size of the events array, that is, the maximum number of ready events that can be processed.
timeout is the timeout time in milliseconds. When timeout is -1, it means blocking mode, that is, it will not return until there is a ready event; when timeout is 0, it means non-blocking mode, that is, it will return immediately, regardless of whether There is a ready event; when timeout is greater than 0, it means to wait for the specified time, and return if there is no ready event after timeout.
Function return value: successfully returns the number of ready file descriptors, and returns 0 if it times out; fails and returns -1, and the failure reason is stored in errno.

The biggest difference between epoll and select and poll is here. There is not only a red-black tree at the bottom of epoll, but also a callback mechanism and a ready queue.

When there is a ready event on the file descriptor, the kernel will trigger an internal callback mechanism, and the kernel will check whether the file descriptor is in the red-black tree of the epoll object (log2 N time complexity ), if it is, then check whether the ready event is set to be monitored by the events in the corresponding node of the red-black tree. If it is, store the epoll_event of the node in the ready epoll_event queue in the kernel, wait for the user to call the epoll_wait function, and copy the contents of the epoll_event ready queue in the kernel to the struct epoll_event *events parameter provided by the user through the epoll_wait function go.

In this way, users can directly get which fds have generated ready events from struct epoll_event *events, and do not need to traverse bitmaps or arrays like in select and poll. Increased efficiency.

advantage:

In the case of a large number of concurrent file descriptors, epoll has higher performance because it uses a callback mechanism and does not need to poll for detection like select and poll.
Support ET (edge trigger) and LT (level trigger) two working modes. The default is LT mode, and ET mode can be set by the EPOLLET flag of the epoll_ctl function.
It is not limited by the number of file descriptors because it is implemented based on a red-black tree.

shortcoming:

epoll is a Linux-specific system call, and other solutions need to be considered in the case of cross-platform requirements.
For the case of a small number of file descriptors, select and poll may be slightly more efficient than epoll, because epoll needs to maintain the data structure of the red-black tree.

In general, epoll is an efficient multi-channel transfer mechanism under Linux, especially suitable for network servers that handle a large number of concurrent connections. In actual development, an appropriate multiplexing mechanism can be selected according to specific requirements.

(4) Two trigger modes of epoll

Level trigger mode (LT, Level-Triggered):

By default, in the horizontal trigger mode, when there is a ready event on the file descriptor, the kernel will continue to trigger the callback mechanism until the user program finishes processing the event. That is to say, as long as the file descriptor is in the ready state, the file descriptor will be returned every time epoll_wait is called, no matter whether the user program handles the ready event or not. This mode requires the user program to handle all ready events by itself.

Features:

In the horizontal trigger mode, the user program needs to pay attention to ensure that the data is completely read or written when processing the ready event, otherwise the file descriptor will still be returned when epoll_wait is called next time, resulting in repeated triggering.
Horizontal trigger mode is the default working mode, and it is also a mode similar to the behavior of select and poll.

Edge-triggered mode (ET, Edge-Triggered):

In edge-triggered mode, when the state of an event on a file descriptor changes from ready to ready, the kernel triggers a callback mechanism to ensure only one notification. That is to say, epoll_wait will return the file descriptor only when the ready status of the file descriptor changes. Once returned, the user program needs to handle all ready events, otherwise the next call to The file descriptor will not be returned again during epoll_wait. Edge-triggered mode requires the user program to be able to handle the ready event in a timely manner.

Features:

Edge-triggered mode has higher performance when dealing with a large number of concurrent connections, because it avoids the overhead of repeated notifications.
In the edge-triggered mode, the user program needs to pay attention to read or write data as much as possible when processing the ready event, and ensure that all the ready data in the buffer are processed to avoid data loss or blocking.
User programs need to handle ET mode with care, because it requires the user to be able to process data on the file descriptor with greater efficiency.

Working mode selection:

When using epoll, you can select the working mode by setting the EPOLLET flag of the epoll_ctl function. By default, epoll uses the horizontal trigger mode.

Example:

struct epoll_event event;
event.events = EPOLLIN | EPOLLET; // set edge trigger mode
event.data.fd = fd; // The file descriptor fd you want to monitor
epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &event);

It should be noted that the edge trigger mode may cause more notification events, so when using the edge trigger mode, the user program needs to handle the ready event more efficiently to avoid performance problems. In addition, the edge trigger mode may have some pitfalls in some specific cases, which need to be used and handled with caution. If the user program cannot handle the ready event in time, the file descriptor may be in the ready state all the time, causing epoll_wait to always return the file descriptor, causing high CPU usage.

4. Reactor mode

With epoll, you can do some interesting things. You can encapsulate a file descriptor with functions that will read and write it (through function pointers or functors) to form an object; The object is managed by an epoll-based object, which can realize automatic processing of file I/O operations. Such a mode is called Reactor mode. This epoll-based object that manages several fd objects is like a reactor. Each fd object is a raw material. Once added to the reactor, it will be automatically called. Function, which is also called event-driven.

The Reactor mode is an event-driven network programming mode that separates I/O events from business logic and implements the design principles of high cohesion and low coupling.
The Reactor pattern has four main roles: Reactor, Handler, Acceptor, and Dispatcher.
- Reactor is a singleton object, which is responsible for creating and managing an epoll object, and provides an interface for registering and unregistering Handler.
- Handler is an abstract class that defines interfaces for handling I/O events, such as handle_read, handle_write, and so on. Each Handler is associated with an fd and a Reactor object.
- Acceptor is a special Handler, which is responsible for listening to client connection requests and creating new Handlers to handle connections.
- Dispatcher is a thread pool object, which is responsible for dispatching Handler to execute corresponding business logic.
The workflow of Reactor mode is as follows:
- First, the Reactor object creates an epoll object and registers an Acceptor object to listen to listen_fd.
- Then, the Reactor object calls epoll_wait to wait for the I/O event, and passes the ready event collection to the Dispatcher object.
- Next, the Dispatcher object allocates a thread from the thread pool to execute the callback function of the corresponding Handler object according to the event type.
- Finally, the Handler object executes corresponding business logic according to the event type, such as reading or sending data, or closing the connection.

In general, Reactor mode is an event-driven network programming mode, which uses epoll’s event-driven mechanism to separate I/O events from business logic, and realizes high concurrency and high performance servers.