18 | Beware of others: check the validity of the data

In the previous lecture, we carefully analyzed the causes of failures, and already knew that in order to deal with various failures that may occur, defenses must be done in the program.

In this lecture, we continue the previous discussion and take a look at what else we need to prepare in order to enhance the robustness of the program.

Abnormal status of the peer

In the previous lectures 11 and 17, we have initially come into contact with some methods to prevent peer exceptions. For example, when making calls such as read, you can judge EOF to prevent the other party’s program from crashing at any time.

int nBytes = recv(connfd, buffer, sizeof(buffer), 0);
if (nBytes == -1) {
    error(1, errno, "error read message");
} else if (nBytes == 0) {
    error(1, 0, "client closed \\
");
}

You can see line 4 in this program. When the read function is called to return 0 bytes, it is actually a reflection of the operating system kernel returning EOF. If the server handles multiple client connections at the same time, shutdown is generally called here to close this end of the connection.

As mentioned in the previous lecture, not every situation can detect anomalies through read operations. For example, if the server completely crashes or the network is interrupted, at this time, if the socket is blocked, it will always be blocked in calls such as read. There is no way to detect socket exceptions.

There are actually several ways to solve this problem.

The first method is to set a timeout for the read operation of the socket. If it exceeds a period of time, the connection is considered to no longer exist. The specific code snippets are as follows:

struct timeval tv;
tv.tv_sec = 5;
tv.tv_usec = 0;
setsockopt(connfd, SOL_SOCKET, SO_RCVTIMEO, (const char *) & amp;tv, sizeof tv);

while (1) {
    int nBytes = recv(connfd, buffer, sizeof(buffer), 0);
    if (nBytes == -1) {
        if (errno == EAGAIN || errno == EWOULDBLOCK) {
            printf("read timeout\\
");
            onClientTimeout(connfd);
        } else {
            error(1, errno, "error read message");
        }
    } else if (nBytes == 0) {
        error(1, 0, "client closed \\
");
    }
    ...
}

This code snippet calls the setsockopt function on line 4 to set the read operation timeout of the socket. The timeout is 5 seconds set on lines 1-3. Of course, the time value here is set by “beating the head”. Compare The scientific setting method is to obtain a more reasonable value through certain statistics. The key point is lines 9-11 where the read operation returns an exception. Based on whether the error message is EAGAIN or EWOULDBLOCK, the timeout is determined and the onClientTimeout function is called for processing.

Although this processing method is relatively simple, it is very practical. Many FTP servers are designed this way. After connecting to this kind of FTP server, if the FTP client does not have the function of resuming the transfer, it will hang up when encountering a network failure or the server crashes.

The second method is the method mentioned in Lecture 12, which is to add detection of whether the connection is normal. If the connection is abnormal, you need to return from the current read blocking and handle it.

There is another way, also mentioned in Lecture 12, which is to use the timeout capability of multiplexing technology to complete the check of socket I/O. If the preset time is exceeded, enter Exception handling.

struct timeval tv;
tv.tv_sec = 5;
tv.tv_usec = 0;

FD_ZERO( & amp;allreads);
FD_SET(socket_fd, & amp;allreads);
for (;;) {
    readmask = allreads;
    int rc = select(socket_fd + 1, & amp;readmask, NULL, NULL, & amp;tv);
    if (rc < 0) {
      error(1, errno, "select failed");
    }
    if (rc == 0) {
      printf("read timeout\\
");
      onClientTimeout(socket_fd);
    }
 ...
}

This code uses select multiplexing technology to poll the socket for I/O events. Line 13 of the program is the processing logic after the timeout is reached. The onClientTimeout function is called to process the timeout.

Buffer processing

A well-designed network program should perform stably with random inputs. Not only that, with the development of the Internet, network security has become more and more important. Whether the network programs we write can perform stably under deliberate attacks by hackers is also an important consideration.

Many hacker programs will specifically construct network protocol packages in a certain format, causing the network program to produce consequences such as buffer overflow and pointer anomalies, affecting the service capabilities of the program. In severe cases, it can even seize control of the server and do whatever it wants. Sabotage activities, such as the famous SQL injection, complete the theft of sensitive database information by constructing SQL statements in a targeted manner.

Therefore, in the process of writing network programs, we need to always remind ourselves that we are facing various complex and abnormal scenarios, even attackers with ulterior motives, and stay vigilant to “prevent others.”

So what kinds of vulnerabilities may appear in the program?

First example

char Response[] = "COMMAND OK";
char buffer[128];

while (1) {
    int nBytes = recv(connfd, buffer, sizeof(buffer), 0);
    if (nBytes == -1) {
        error(1, errno, "error read message");
    } else if (nBytes == 0) {
        error(1, 0, "client closed \\
");
    }

    buffer[nBytes] = '\0';
    if (strcmp(buffer, "quit") == 0) {
        printf("client quit\\
");
        send(socket, Response, sizeof(Response), 0);
    }

    printf("received %d bytes: %s\\
", nBytes, buffer);
}

This code obtains the byte stream from the connection socket, and determines the outbound and EOF conditions. If the character sent by the peer is “quit”, it responds with the character stream of “COMAAND OK”. At first glance, everything seems normal.

But if you look closely, this code will most likely produce the following results.

char buffer[128];
buffer[128] = '\0';

When the number of characters read through recv is 128, this is the result. Because the size of the buffer is only 128 bytes, the final assignment link causes a buffer overflow problem.

The so-called buffer overflow refers to a memory violation operation that occurs in computer programs. The essence is that the data filled by the computer program into the buffer exceeds the size limit set by the original buffer, causing the data to overwrite other legal data in the memory stack space. This kind of overwriting destroys the integrity of the original program. Students who have used game modifiers must know that if you accidentally modify the wrong memory space of game data, it is likely to cause the application to generate errors such as “Access violation”, causing the application to crash. .

We can slightly modify this program. The main idea is to leave a byte in the buffer to accommodate the following ‘\0’.

int nBytes = recv(connfd, buffer, sizeof(buffer)-1, 0);

This example also reveals an interesting phenomenon. You will find that when we send the string, we call sizeof, which means that ‘\0’ in the Response string is sent, and when we receive the characters, we assume that there is no \ The presence of ‘\0’ character.

For the sake of unification, we can change it to the following method, using strlen to ignore the last ‘\0’ character.

send(socket, Response, strlen(Response), 0);

Second example

Lecture 16 mentioned two methods for parsing variable-length messages. One is to use special boundary symbols, such as the carriage return and line feed characters used by HTTP; the other is to encode the length of the message information into the message.

In actual combat, we also need to be vigilant about the length of this part of the message.

size_t read_message(int fd, char *buffer, size_t length) {
    u_int32_t msg_length;
    u_int32_t msg_type;
    int rc;

    rc = readn(fd, (char *) & amp;msg_length, sizeof(u_int32_t));
    if (rc != sizeof(u_int32_t))
        return rc < 0 ? -1 : 0;
    msg_length = ntohl(msg_length);

    rc = readn(fd, (char *) & amp;msg_type, sizeof(msg_type));
    if (rc != sizeof(u_int32_t))
        return rc < 0 ? -1 : 0;

    if (msg_length > length) {
        return -1;
    }

    /* Retrieve the record itself */
    rc = readn(fd, buffer, msg_length);
    if (rc != msg_length)
        return rc < 0 ? -1 : 0;
    return rc;
}

When parsing the message, line 15 compares the actual message length msg_length with the buffer size allocated by the application. If the message length is too large and the buffer cannot accommodate it, -1 is returned directly to indicate an error. Don’t underestimate this part of the judgment. Imagine that without this judgment, the message body sent by the other party’s program may construct a very large msg_length, but the actual length of the message body sent is not so large, so that subsequent reading The fetch operation will not succeed. If the actual buffer size of the application is smaller than msg_length, a buffer overflow problem will also occur.

struct {
    u_int32_t message_length;
    u_int32_t message_type;
    char data[128];
} message;

int n = 65535;
message.message_length = htonl(n);
message.message_type = 1;
char buf[128] = "just for fun\0";
strncpy(message.data, buf, strlen(buf));
if (send(socket_fd, (char *) &message,
         sizeof(message.message_length) + sizeof(message.message_type) + strlen(message.data), 0) < 0)
    error(1, errno, "send failure");

This is a program that was “accidentally” constructed by the sender. The length of the message was “accidentally” set to 65535, and the actual message data sent was “just for fun”. After removing the actual message length msg_length and comparing it with the buffer size allocated by the application, the server side has been blocked on the read call. This is because the server side mistakenly believes that it needs to receive bytes of size 65535.

The third example

If we need to develop a function, this function assumes that the delimiter of the message is a newline character (\\
). A simple idea is to read one character at a time and determine whether the character is a newline character.

There is such a function here. The biggest problem with this function is that its work efficiency is too low. You must know that each call to the recv function is a system call and requires switching from user space to kernel space. The overhead of context switching is the highest for high performance. It’s better to save money if you can.

size_t readline(int fd, char *buffer, size_t length) {
    char *buf_first = buffer;

    char c;
    while (length > 0 & amp; & amp; recv(fd, & amp;c, 1, 0) == 1) {
        *buffer + + = c;
        length--;
        if (c == '\\
') {
            *buffer = '\0';
            return buffer - buf_first;
        }
    }

    return -1;
}

So, there is a second version. This function reads up to 512 bytes at a time into the temporary buffer, and then copies the characters in the temporary buffer one by one to the final buffer of the application. This approach will obviously be more efficient. Much higher.

size_t readline(int fd, char *buffer, size_t length) {
    char *buf_first = buffer;
    static char *buffer_pointer;
    int nleft = 0;
    static char read_buffer[512];
    char c;

    while (length-- > 0) {
        if (nleft <= 0) {
            int nread = recv(fd, read_buffer, sizeof(read_buffer), 0);
            if (nread < 0) {
                if (errno == EINTR) {
                    length + + ;
                    continue;
                }
                return -1;
            }
            if(nread==0)
                return 0;
            buffer_pointer = read_buffer;
            nleft = nread;
        }
        c = *buffer_pointer + + ;
        *buffer + + = c;
        nleft--;
        if (c == '\\
') {
            *buffer = '\0';
            return buffer - buf_first;
        }
    }
    return -1;
}

The main loop of this program is on line 8. It tries to solve the buffer length overflow problem by judging the length variable; line 9 is to judge whether all the characters in the temporary buffer have been copied. If all the characters have been copied, it will Try to read up to 512 bytes again; lines 20-21 reset the temporary buffer read pointer and the number of characters to be read in the temporary buffer after successfully reading characters; lines 23-25 copy the temporary buffer Buffer characters are copied one character at a time, and the temporary buffer read pointer is moved, and the number of characters to be read in the temporary buffer is decremented by 1. In lines 26-28 of the program, determine whether a newline character is read. If so, truncate the final buffer of the application and return the final number of characters read.

This program may run without problems for a long time, but it still has a tiny flaw that is likely to cause an online failure.

In order to clarify this fault, we assume that the call is like this, and the input character is 012345678\\
.

//The input characters are: 012345678\\

char buf[10]
readline(fd, buf, 10)

When the last \\
character is read, the length is 1. The problem is on lines 26 and 27. If a newline character is read, a string cutoff character is added, which obviously exceeds the size of the application buffer. .

This is a correct program. The most important thing here is that length needs to be processed first, and then it is judged whether the size of length can accommodate characters.

size_t readline(int fd, char *buffer, size_t length) {
    char *buf_first = buffer;
    static char *buffer_pointer;
    int nleft = 0;
    static char read_buffer[512];
    char c;

    while (--length> 0) {
        if (nleft <= 0) {
            int nread = recv(fd, read_buffer, sizeof(read_buffer), 0);
            if (nread < 0) {
                if (errno == EINTR) {
                    length + + ;
                    continue;
                }
                return -1;
            }
            if(nread==0)
                return 0;
            buffer_pointer = read_buffer;
            nleft = nread;
        }
        c = *buffer_pointer + + ;
        *buffer + + = c;
        nleft--;
        if (c == '\\
') {
            *buffer = '\0';
            return buffer - buf_first;
        }
    }
    return -1;
}

Summary

To summarize: In network programming, whether we can detect various abnormal boundaries will determine the stability of our program in harsh situations. Therefore, we must always remind ourselves to be prepared to deal with various complex situations. , exceptions here include buffer overflow, pointer errors, connection timeout detection, etc.