Network Programming (5) IO Multiplexing — epoll

1. Concept

The full name of epoll is eventpoll, a notification mechanism based on event triggering. When the file descriptor read buffer is not empty, the read event is triggered. When the file descriptor write buffer is writable, the write event is triggered.

2. Principle

The core data structure of epoll is: a red-black tree and a linked list.

1. Red-black trees are used to store file descriptors to facilitate quick indexing to file descriptors.

2. The linked list is responsible for storing ready file descriptors.

3. When calling the epoll_wait() function, just observe whether there is data in the linked list.

4. The data of the linked list is transferred through mmap to reduce the cost of copying.

3. Features

1. No longer search for file descriptors through traversal.

2. Just pass the ready file descriptors in the linked list.

3. The kernel and user space pass data through shared memory.

4. epoll API

struct epoll_event {
    uint32_t events; // Indicates the event type of concern, which can be a combination of EPOLLIN, EPOLLOUT, EPOLLERR, EPOLLHUP, etc.
    epoll_data_t data; // User data, which can be a file descriptor or pointer, etc.
};

typedef union epoll_data {
    void *ptr; // User data used to specify pointer type
    int fd; // User data used to specify the file descriptor type
    uint32_t u32; // User data used to specify a 32-bit unsigned integer type
    uint64_t u64; // User data used to specify a 64-bit unsigned integer type
} epoll_data_t;

event macro
 EPOLLIN: Indicates that the corresponding file descriptor can be read (including the normal closing of the peer SOCKET);
 EPOLLOUT: Indicates that the corresponding file descriptor can be written;
 EPOLLPRI: Indicates that the corresponding file descriptor has urgent data to read (this should indicate the arrival of out-of-band data);
 EPOLLERR: Indicates that an error occurred in the corresponding file descriptor;
 EPOLLHUP: Indicates that the corresponding file descriptor is hung up;
 EPOLLET: Set EPOLL to edge trigger (Edge Triggered) mode (default is horizontal trigger), which is relative to level trigger (Level Triggered).
 EPOLLONESHOT: Only listens to the event once. After listening to this event, if you still need to continue to monitor the socket, you need to add the socket to the EPOLL queue again
#include <sys/epoll.h>

int epoll_create(int size);
//Function: Create a new epoll instance
//parameter:
    size The maximum number of file descriptors supported (this parameter has been ignored after Linux 2.6.8, and it can be greater than 0)
//return value:
    Success: new epoll file descriptor for subsequent epoll operations
    Failure: -1
-------------------------------------------------- ----------------------------------
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
//Function: Control events on the epoll file descriptor
//parameter:
    epfd: epoll file descriptor, returned by epoll_create
    op: specifies the operation type:
        EPOLL_CTL_ADD //Add event
        EPOLL_CTL_MOD //Modify event
        EPOLL_CTL_DEL //Delete event
    fd: the file descriptor that needs attention
    event: a structure specifying the event type and other control parameters
//return value:
    Success: 0
    Failure: -1
-------------------------------------------------- ----------------------------------
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
//Function: Wait for events on the epoll file descriptor to occur
//parameter:
    epfd: epoll file descriptor, returned by epoll_create
    events: array of structures used to store events.
    maxevents: the size of the array, that is, the maximum number of events that can be processed
    timeout: Timeout for waiting for events, in milliseconds. -1 blocking, 0 non-blocking
//return value:
    Success: Number of file descriptors on which the event occurred
    Failure: -1
-------------------------------------------------- ----------------------------------

5. Working mode of epoll

5.1 ET edge trigger

5.1.1 Working Principle

When the event status on the monitored file descriptor changes, epoll will only notify once until the next status change occurs.

In other words, it is notified only when the file descriptor status changes from no data to read/write to data to read/write.

5.1.2 Setting method

struct epoll_event ev;
ev.events = EPOLLIN | EPOLLET; // Monitor the edge trigger of readable events
// or
ev.events = EPOLLOUT | EPOLLET; // Monitor the edge trigger of writable events

5.1.3 Applicable scenarios

1. The server monitors the connection on the socket. When a new connection arrives, epoll_wait returns a readable event. The server reads the data until the buffer is empty, and triggers a new readable event again.

2. Read data from the file into the memory buffer. When the file is readable, epoll_wait returns a readable event, reads the file content until it is finished, and triggers a new readable event again.

3. Inter-process communication is performed through pipes. When the pipe is readable, epoll_wait returns a readable event, reads the pipe content until it is empty, and triggers a new readable event again.

5.2 LT horizontal trigger mode

5.2.1 Working Principle

When the event status on the monitored file descriptor changes, epoll will continue to notify until the status returns to no data to read/write.

In other words, as long as there is data to read/write on the file descriptor, it will always be notified.

5.2.2 Setting method

struct epoll_event ev;
ev.events = EPOLLIN; // Listen for horizontal triggering of readable events
// or
ev.events = EPOLLOUT; // Listen for horizontal triggering of writable events

5.2.3 Applicable scenarios

1. A multi-threaded program, each thread is responsible for processing the data of a file. The main thread listens to readable events on the file descriptor through epoll_wait, and then dispatches tasks to idle worker threads. The thread reads the file data until it is complete, and then continues to listen for readable events.

6. Code examples

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <stdlib.h>
#include <string.h>
#include <sys/epoll.h>

#defineMAX_EVENTS 10
#define MAX_BUFF_SIZE 1024

int main() {
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd < 0) {
        perror("socket");
        return -1;
    }

    struct sockaddr_in addr;
    addr.sin_family = AF_INET;
    addr.sin_port = htons(2048);
    addr.sin_addr.s_addr = INADDR_ANY;
    if (bind(sockfd, (struct sockaddr*) & amp;addr, sizeof(addr)) < 0) {
        perror("bind");
        close(sockfd);
        return -1;
    }

    listen(sockfd, 10);

    //Create epoll handle
    int epollfd = epoll_create(MAX_EVENTS);
    if (epollfd == -1) {
        perror("epoll_create");
        close(sockfd);
        return -1;
    }

    //Add listening socket sockfd to epoll
    struct epoll_event event;
    event.events = EPOLLIN;
    event.data.fd = sockfd;
    if (epoll_ctl(epollfd, EPOLL_CTL_ADD, sockfd, & amp;event) == -1) {
        perror("epoll_ctl");
        close(epollfd);
        close(sockfd);
        return -1;
    }

    struct epoll_event events[MAX_EVENTS];

    char buff[MAX_BUFF_SIZE];

    while (1) {
        // Listen for events
        int nfds = epoll_wait(epollfd, events, MAX_EVENTS, -1);
        if (nfds == -1) {
            perror("epoll_wait");
            break;
        }

        for (int i = 0; i < nfds; + + i) {
            if (events[i].data.fd == sockfd) {
                //There is a new connection
                int clientfd = accept(sockfd, NULL, NULL);
                if (clientfd == -1) {
                    perror("accept");
                    continue;
                }

                //Add new connection to epoll
                event.events = EPOLLIN;
                event.data.fd = clientfd;
                if (epoll_ctl(epollfd, EPOLL_CTL_ADD, clientfd, & amp;event) == -1) {
                    perror("epoll_ctl");
                    close(clientfd);
                    continue;
                }
            } else {
                //The connected socket is readable
                int n = recv(events[i].data.fd, buff, sizeof(buff), 0);
                if (n <= 0) {
                    if (n == 0) {
                        // connection closed
                        printf("exit fd[%d]\\
", events[i].data.fd);
                    } else {
                        // Error
                        perror("recv");
                    }

                    //Remove socket from epoll
                    epoll_ctl(epollfd, EPOLL_CTL_DEL, events[i].data.fd, NULL);
                    close(events[i].data.fd);
                } else {
                    //Data received normally
                    printf("fd[%d] recv: %s\\
", events[i].data.fd, buff);
                    send(events[i].data.fd, buff, n, 0);
                }
            }
        }
    }

    close(epollfd);
    close(sockfd);
    return 0;
}