Discussion on concurrency between TCPServer and select/poll/epoll in C language

Discussion on concurrency between TCPServer and select/poll/epoll in C language

TCPServer

Start a server

First, let’s look at the implementation of TCPServer under the simplest Linux system:

int main()
{<!-- -->
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);

    struct sockaddr_in servaddr;
    memset( & amp;servaddr, 0, sizeof(struct sockaddr_in));
    servaddr.sin_family = AF_INET;
    servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
    servaddr.sin_port = htons(9999);

    if(-1 == bind(sockfd, (struct sockaddr*) & amp;servaddr, sizeof(struct sockaddr)))
    {<!-- -->
        printf("bind failed: %s", strerror(errno));
        return -1;
    }

    listen(sockfd, 10);
    sleep(10);
}

When we run the above code, we can find that the process is blocked. We use the command netstat -anop | grep 9999 to check the port status and see that the status is LISTEN, which means that the port is listening. We continue to improve the code:

int main()
{<!-- -->
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);

    struct sockaddr_in servaddr;
    memset( & amp;servaddr, 0, sizeof(struct sockaddr_in));
    servaddr.sin_family = AF_INET;
    servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
    servaddr.sin_port = htons(9999);

    if(-1 == bind(sockfd, (struct sockaddr*) & amp;servaddr, sizeof(struct sockaddr)))
    {<!-- -->
        printf("bind failed: %s", strerror(errno));
        return -1;
    }

    listen(sockfd, 10);

    struct sockaddr_in clientaddr;
    socklen_t len = sizeof(clientaddr);

    while(1)
    {<!-- -->
        int clientfd = accept(sockfd, (struct sockaddr*) & amp;clientaddr, & amp;len);
        printf("accept\
");
    }
}


Run the above program, and you will find that the process is blocked at the accept location. While waiting for the client to connect, we can try to connect:

As you can see, the client can easily connect. If you don’t know the IP of Linux, you can take a look at ifconfig. This phenomenon shows that accept is a blocking function and has been waiting for the client to connect. Only the client can connect. An accept will be printed on the above page, so how to set it to non-blocking? We need to set sockfd to non-blocking mode:

int main()
{<!-- -->
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);

    struct sockaddr_in servaddr;
    memset( & amp;servaddr, 0, sizeof(struct sockaddr_in));
    servaddr.sin_family = AF_INET;
    servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
    servaddr.sin_port = htons(9999);

    if(-1 == bind(sockfd, (struct sockaddr*) & amp;servaddr, sizeof(struct sockaddr)))
    {<!-- -->
        printf("bind failed: %s", strerror(errno));
        return -1;
    }

    listen(sockfd, 10);
    // sleep(10);

    printf("sleep\
");
    int flags = fcntl(sockfd, F_GETFL, 0);
    flags |= O_NONBLOCK;
    fcntl(sockfd, F_SETFL, flags);

    struct sockaddr_in clientaddr;
    socklen_t len = sizeof(clientaddr);

    while(1)
    {<!-- -->
        int clientfd = accept(sockfd, (struct sockaddr*) & amp;clientaddr, & amp;len);
        printf("accept\
");
    }
}

In this way, accept will not block, and the following code will still be executed without a client connection.

Data sending and receiving

After starting a server, we will start sending and receiving data. The code is implemented like this:

//Accept buffer size
#defineBUFFER_LENGTH 1024

int main()
{<!-- -->
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);

    struct sockaddr_in servaddr;
    memset( & amp;servaddr, 0, sizeof(struct sockaddr_in));
    servaddr.sin_family = AF_INET;
    servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
    servaddr.sin_port = htons(9999);

    if(-1 == bind(sockfd, (struct sockaddr*) & amp;servaddr, sizeof(struct sockaddr)))
    {<!-- -->
        printf("bind failed: %s\
", strerror(errno));
        return -1;
    }

    listen(sockfd, 10);

    struct sockaddr_in clientaddr;
    socklen_t len = sizeof(clientaddr);
    int clientfd = accept(sockfd, (struct sockaddr*) & amp;clientaddr, & amp;len);
    printf("accept\
");

    while(1)
    {<!-- -->
        char buffer[BUFFER_LENGTH] = {<!-- -->0};
        int ret = recv(clientfd, buffer, BUFFER_LENGTH, 0);
        printf("ret: %d, buffer: %s\
", ret, buffer);
        send(clientfd, buffer, ret, 0);
    }
}

From the above phenomenon, we can see that the return value of the data receiving function recv is the length of the received string, and it is a blocking function, waiting for us to send data. Colleagues who receive the data use send to send it back.

Summary

The above is the implementation of TCPServer. Today we mainly discuss the implementation of concurrency, so TCPServer is just a simple implementation.

Concurrency

We mainly discuss the differences between multi-threading, select, poll and epoll, their operating principles and some issues.

Multi-threading

If you want to connect many clients at once, you will definitely think of multi-threading first. Let’s first implement TCPServer’s multi-threading and discuss its limitations:

#define BUFFER_LENGTH 1024

//thread function
void *client_thread(void *arg)
{<!-- -->
    int clientfd = *(int*)arg;

    while(1)
    {<!-- -->
        char buffer[BUFFER_LENGTH] = {<!-- -->0};
        int ret = recv(clientfd, buffer, BUFFER_LENGTH, 0);

        if(ret == 0)
        {<!-- -->
            close(clientfd);
            break;
        }
        printf("ret: %d, buffer: %s\
", ret, buffer);

        send(clientfd, buffer, ret, 0);
    }
}

int main()
{<!-- -->
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);

    struct sockaddr_in servaddr;
    memset( & amp;servaddr, 0, sizeof(struct sockaddr_in));
    servaddr.sin_family = AF_INET;
    servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
    servaddr.sin_port = htons(9999);

    if(-1 == bind(sockfd, (struct sockaddr*) & amp;servaddr, sizeof(struct sockaddr)))
    {<!-- -->
        printf("bind failed: %s\
", strerror(errno));
        return -1;
    }

    listen(sockfd, 10);

    struct sockaddr_in clientaddr;
    socklen_t len = sizeof(clientaddr);
    while(1)
    {<!-- -->
        int clientfd = accept(sockfd, (struct sockaddr*) & amp;clientaddr, & amp;len);
        pthread_t threadid;
        //Pass clientfd parameters last night into the thread
        pthread_create( & amp;threadid, NULL, client_thread, & amp;clientfd);
    }
}

As you can see, we achieve server-side concurrency by constructing a thread function and putting clientfd into different threads. But if we have a large number of users, we will need tens of thousands, hundreds of thousands or even millions of threads. Such server resources will definitely not be enough, so we introduced the select mechanism to save server resources while meeting concurrency requirements. .

Handle number

I don’t know how to express this, but in the Windows API, fd is a handle, which is also of type int, so let’s call it the handle number first. What is he? You can find that our sockfd and clientfd are both int types, so we might as well print them out to see what their relationship is. Here we use the code we have already written Modified slightly:

void *client_thread(void *arg)
{<!-- -->
    int clientfd = *(int*)arg;

    while(1)
    {<!-- -->
        char buffer[BUFFER_LENGTH] = {<!-- -->0};
        int ret = recv(clientfd, buffer, BUFFER_LENGTH, 0);

        if(ret == 0)
        {<!-- -->
            close(clientfd);
            break;
        }
        printf("ret: %d, buffer: %s\
", ret, buffer);

        send(clientfd, buffer, ret, 0);
    }
}

int main()
{<!-- -->
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    printf("sockfd: %d\
", sockfd);
    struct sockaddr_in servaddr;
    memset( & amp;servaddr, 0, sizeof(struct sockaddr_in));
    servaddr.sin_family = AF_INET;
    servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
    servaddr.sin_port = htons(9999);

    if(-1 == bind(sockfd, (struct sockaddr*) & amp;servaddr, sizeof(struct sockaddr)))
    {<!-- -->
        printf("bind failed: %s\
", strerror(errno));
        return -1;
    }

    listen(sockfd, 10);

    struct sockaddr_in clientaddr;
    socklen_t len = sizeof(clientaddr);
    while(1)
    {<!-- -->
        int clientfd = accept(sockfd, (struct sockaddr*) & amp;clientaddr, & amp;len);
        printf("clientfd: %d\
", clientfd);
        pthread_t threadid;
        pthread_create( & amp;threadid, NULL, client_thread, & amp;clientfd);
    }
}

We add two printf to the code to print out sockfd and clientfd respectively. The results are as follows:

As you can see, our clientfd number actually increases one by one with the sockfd number. This is the issue about the handle number. With this concept, you can read on.

select mechanism

First of all, we must understand what the select mechanism is. Simply put, it stores all threads in a container, traverses the container to see which clientfds are readable and which are writable, and then aligns them to perform corresponding read and write operations. Let’s look at the code first:

#define BUFFER_LENGTH 1024

int main()
{<!-- -->
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);

    struct sockaddr_in servaddr;
    memset( & amp;servaddr, 0, sizeof(struct sockaddr_in));
    servaddr.sin_family = AF_INET;
    servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
    servaddr.sin_port = htons(9999);

    printf("begin bind...\
");

    if(-1 == bind(sockfd, (struct sockaddr*) & amp;servaddr, sizeof(struct sockaddr)))
    {<!-- -->
        printf("bind error: %s\
", strerror(errno));
        return -1;
    }

    listen(sockfd, 10);

    struct sockaddr_in clientaddr;
    socklen_t len = sizeof(clientaddr);

    fd_set rfds, rset;
    FD_ZERO( & amp;rfds);
    FD_SET(sockfd, & amp;rfds);

    int maxfd = sockfd;
    int clientfd = 0;

    while(1)
    {<!-- -->
        rset = rfds;

        int nready = select(maxfd + 1, & amp;rset, NULL, NULL, NULL);
        printf("nready: %d\
", nready);
        if(FD_ISSET(sockfd, & amp;rset))
        {<!-- -->
            clientfd = accept(sockfd, (struct sockaddr*) & amp;clientaddr, & amp;len);
            printf("accept: %d\
", clientfd);

            FD_SET(clientfd, & amp;rfds);
            if(clientfd > maxfd)
                maxfd = clientfd;
            if(--nready == 0)
                continue;
        }

        int i = 0;
        for(i = sockfd + 1; i <= maxfd; i + + )
        {<!-- -->
            if(FD_ISSET(i, & amp;rset))
            {<!-- -->
                char buffer[BUFFER_LENGTH] = {<!-- -->0};
                int ret = recv(i, buffer, BUFFER_LENGTH, 0);
                if(ret == 0)
                {<!-- -->
                    close(i);
                    break;
                }
                printf("ret: %d, buffer: %s\
", ret, buffer);
                send(i, buffer, ret, 0);
            }
        }
    }
}

As can be seen from the experimental results, we implemented a concurrent server using the select mechanism. Let’s step by step analyze the implementation of the select mechanism.

 //Define rfds (readable fd collection) and rset (fd collection passed to the kernel)
fd_set rfds, rset;
//Initialize to 0
    FD_ZERO( & amp;rfds);
//Compare the data of the collection and the socket and set the collection
    FD_SET(sockfd, & amp;rfds);
//The largest handle number is the handle number of sockfd ()
    int maxfd = sockfd;
    int clientfd = 0;

    while(1)
    {<!-- -->
        //Copy a copy of rfds and prepare to pass it to the kernel
        rset = rfds;

        //Update the readable socket collection
        int nready = select(maxfd + 1, & amp;rset, NULL, NULL, NULL);
        printf("nready: %d\
", nready);
        //Determine whether sockfd is readable
        if(FD_ISSET(sockfd, & amp;rset))
        {<!-- -->
            clientfd = accept(sockfd, (struct sockaddr*) & amp;clientaddr, & amp;len);
            printf("accept: %d\
", clientfd);
//Add new fd to the collection
            FD_SET(clientfd, & amp;rfds);
            if(clientfd > maxfd)
                //Update macfd
                maxfd = clientfd;
            if(--nready == 0)
                continue;
        }

        int i = 0;
        //Start traversing from the first clientfd
        for(i = sockfd + 1; i <= maxfd; i + + )
        {<!-- -->
            if(FD_ISSET(i, & amp;rset))
            {<!-- -->
                char buffer[BUFFER_LENGTH] = {<!-- -->0};
                int ret = recv(i, buffer, BUFFER_LENGTH, 0);
                if(ret == 0)
                {<!-- -->
                    close(i);
                    break;
                }
                printf("ret: %d, buffer: %s\
", ret, buffer);
                send(i, buffer, ret, 0);
            }
        }
    }
}

illustrate:

  • Regarding macfd = clientfd, it does not mean that clientfd will only grow. If a client is disconnected and the previous clientfd position is vacated, the subsequent clientfd will make up for the previous one.
  • Regarding the parameters in the select function, in fact, normally three sets of sets rfds, wfds, and efds need to be passed to represent reading respectively. , write and error. However, in the demonstration, we passed the read collection. When it is officially used, 3 copies of the collection need to be copied, which consumes a lot of server resources.
  • Regarding the line of code int nready = select(maxfd + 1, & amp;rset, NULL, NULL, NULL);: Since the underlying traversal is <, maxfd needs + 1 (for loop).
  • Disadvantages: select can only have up to 1024 fds.
  • Advantages: select cross-platform
poll mechanism

Generally speaking, poll and select are similar. They both achieve concurrency by traversing arrays. Let’s look at the code first:

#define BUFFER_LENGTH 1024

#define POLL_SIZE 1024

int main()
{<!-- -->
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    struct sockaddr_in servaddr;
    memset( & amp;servaddr, 0, sizeof(struct sockaddr_in));
    servaddr.sin_family = AF_INET;
    servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
    servaddr.sin_port = htons(9999);

    if(-1 == bind(sockfd, (struct sockaddr*) & amp;servaddr, sizeof(struct sockaddr)))
    {<!-- -->
        printf("bind failed: %s", strerror(errno));
        return -1;
    }

    listen(sockfd, 10);

    struct sockaddr_in clientaddr;
    socklen_t len = sizeof(clientaddr);

    //poll
    struct pollfd fds[POLL_SIZE] = {<!-- -->0};

    fds[sockfd].fd = sockfd;
    fds[sockfd].events = POLLIN;

    int maxfd = sockfd;
    int clientfd = 0;

    while(1)
    {<!-- -->
        int nready = poll(fds, maxfd + 1, -1);
        if(fds[sockfd].revents & amp; POLLIN)
        {<!-- -->
            clientfd = accept(sockfd, (struct sockaddr*) & amp;clientaddr, & amp;len);
            printf("accept: %d\
", clientfd);

            fds[clientfd].fd = clientfd;
            fds[clientfd].events = POLLIN;

            if(clientfd > maxfd)
                maxfd = clientfd;

            if(--nready == 0)
                continue;
        }

        int i = 0;
        for(int i = 0; i < maxfd + 1; i + + )
        {<!-- -->
            if(fds[i].revents & amp; POLLIN)
            {<!-- -->
                char buffer[BUFFER_LENGTH] = {<!-- -->0};
                int ret = recv(i, buffer, BUFFER_LENGTH, 0);
                if(ret == 0)
                {<!-- -->
                    fds[i].fd = -1;
                    fds[i].events = 0;

                    close(i);
                    break;

                }

                printf("ret: %d, buffer: %s\
", ret, buffer);

                send(i, buffer, ret, 0);
            }
        }
    }
}

As you can see, from the code point of view, the principles of poll and select are similar, but there are still several differences.

  1. Poll has fewer interfaces. From the interface point of view, select has FD_SET, FDZERO and other interfaces, which consumes more resources than poll.
  2. Poll has only one collection and does not need to copy multiple collections like select.
epoll mechanism

This is a very important mechanism, so important that without this mechanism, Linux would not function as a server. In the past, Linux was mainly used for embedded industrial development. It was not until epoll appeared that Linux entered the server development market. The importance of epoll is that it solves the problem of the number of IOs and is no longer limited to the number of clientfd. Let’s look at the code implementation first:

#define BUFFER_LENGTH 1024

int main()
{<!-- -->
    int sockfd = socket(AF_INET, SOCK_STREAM, 0);

    struct sockaddr_in servaddr;
    memset( & amp;servaddr, 0, sizeof(struct sockaddr_in));
    servaddr.sin_family = AF_INET;
    servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
    servaddr.sin_port = htons(9999);

    if(-1 == bind(sockfd, (struct sockaddr*) & amp;servaddr, sizeof(struct sockaddr)))
    {<!-- -->
        printf("bind failed: %s\
", strerror(errno));
        return -1;
    }

    listen(sockfd, 10);

    struct sockaddr_in clientaddr;
    socklen_t len = sizeof(clientaddr);

    int epfd = epoll_create(1);

    struct epoll_event ev;
    ev.events = EPOLLIN;
    ev.data.fd = sockfd;

    epoll_ctl(epfd, EPOLL_CTL_ADD, sockfd, & amp;ev);

    struct epoll_event events[1024] = {<!-- -->0};

    while(1)
    {<!-- -->
        //The number of ios that need to be traversed
        int nready = epoll_wait(epfd, events, 1024, -1);
        printf("nready: %d\
", nready);
        if(nready < 0)
            continue;

        int i = 0;
        //Traverse IO
        for(int i = 0; i < nready; i + + )
        {<!-- -->
            int connfd = events[i].data.fd;

            if(sockfd == connfd)
            {<!-- -->
                int clientfd = accept(sockfd, (struct sockaddr*) & amp;clientaddr, & amp;len);
                if(clientfd < 0)
                    continue;

                printf("clientfd: %d\
", clientfd);

                ev.events = EPOLLIN | EPOLLET;
                ev.data.fd = clientfd;
                epoll_ctl(epfd, EPOLL_CTL_ADD, clientfd, & amp;ev);
            }
            else if(events[i].events & amp; EPOLLIN)
            {<!-- -->
                char buffer[BUFFER_LENGTH] = {<!-- -->0};
                short len = 0;
                recv(connfd, &len, 2, 0);
                len = ntohs(len);

                int n = recv(connfd, buffer, BUFFER_LENGTH, 0);
                if(n > 0)
                {<!-- -->
                    printf("recv: %s\
", buffer);
                    send(connfd, buffer, n, 0);
                }
                else if(n==0)
                {<!-- -->
                    printf("close\
");
                    epoll_ctl(epfd, EPOLL_CTL_DEL, connfd, NULL);

                    close(connfd);
                }
            }
        }
    }
}

The mechanism of epoll is very different from the previous two. Its mechanism is similar to a hive. For example, if a residential building has a hive, and each time a clientfd is added, the number of residents will be +1, but the hive will remain unchanged. Only when residents need to send express delivery, residents place the express delivery at the hive, or go to the hive to receive the express delivery. This traversal of IO is limited to the hive, not each resident. It is equivalent to saying that in the past, couriers had to go to every household to ask if they wanted to send or receive express delivery. Now they only need to go to the hive, which greatly improves efficiency and saves resources.

Notice:

Regarding the writing of this piece of code, it actually doesn’t need to be so troublesome. I just experimented with horizontal triggering and edge triggering. You can rewrite it yourself.