10 | TIME_WAIT: The devil hidden in the details

In the previous basics, we sorted out the basic knowledge involved in network programming. The main content includes the C/S programming model, TCP protocol, UDP protocol and local sockets. In the improvement chapter, I will combine my experience to guide you to a deeper understanding of TCP and UDP.

After studying the improvement chapter, I hope you will have a comprehensive and clear understanding of how to improve the robustness of TCP and UDP programs, thereby laying a good foundation for an in-depth understanding of the performance chapter.

In the previous basics, we learned about TCP’s four waves. During the four waves, the party that initiated the connection disconnection will be in the TIME_WAIT state for a period of time. Do you know what TIME_WAIT is used for? In interviews and actual combat, issues related to TIME_WAIT are always a difficult problem that cannot be circumvented. Please follow me below and find out the devil hidden in the details.

TIME_WAIT scenario

Let’s start with an online failure. After upgrading the online application service, we found that the availability of the service became good and bad. It could provide services to the outside world for a period of time, and then suddenly became unavailable for a period of time. Everyone was puzzled. The operation and maintenance students logged in to the host where the service is located and used the netstat command to check it. They found that there were thousands of connections in the TIME_WAIT state on the host.

After layer-by-layer analysis, we found that the culprit was TIME_WAIT. why? Our application service needs to provide external services by initiating TCP connections. Each connection will occupy a local port. When there is high concurrency, there are too many connections in the TIME_WAIT state, so many that the available ports on the local machine are exhausted. The external symptom of the application service is that it cannot work properly. After a period of time, the connection in TIME_WAIT is recycled and closed by the system, and the local port is released for use, and the application service appears to the outside world and can work normally. In this way, it will start over and over again, and there will be a phenomenon where it does not work for a while, and then it can work normally again after a minute or two.

So why are so many TIME_WAIT connections generated?

This starts with the four waves of TCP.

When the TCP connection is terminated, host 1 first sends a FIN message. Host 2 enters the CLOSE_WAIT state and sends an ACK response. At the same time, host 2 obtains EOF through the read call and notifies the application of this result to perform an active shutdown operation and sends a FIN message. arts. Host 1 sends an ACK response after receiving the FIN message. At this time, host 1 enters the TIME_WAIT state.

The duration of host 1’s stay in TIME_WAIT is fixed, which is twice the maximum segment lifetime MSL (maximum segment lifetime), generally called 2MSL. Like most BSD-derived systems, Linux systems have a hard-coded field named TCP_TIMEWAIT_LEN with a value of 60 seconds. In other words,Linux systems stay in TIME_WAIT for a fixed 60 seconds.

#define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME- WAIT state, about 60 seconds */

After this time, host 1 enters the CLOSED state. Why this time? You can think about it first and the answer will come later.

You must remember that only the party that initiates the connection termination will enter the TIME_WAIT state. This is often asked during interviews.

The role of TIME_WAIT

You may ask, why not directly enter the CLOSED state, but stay in the TIME_WAIT state?

This comes from two aspects.

First, this is done to ensure that the final ACK is received by the passive closing party, thus helping it to close properly.

When TCP was designed, sufficient fault-tolerance was designed. For example, TCP assumes that messages will make errors and need to be retransmitted. Here, if the ACK message of host 1 in the figure is not successfully transmitted, then host 2 will resend the FIN message.

If host 1 does not maintain the TIME_WAIT state and enters the CLOSED state directly, it loses the context of the current state and can only reply with a RST operation, resulting in an error on the passive shutdown side.

Now that host 1 knows that it is in the TIME_WAIT state, it can resend an ACK message after receiving the FIN message, so that host 2 can enter the normal CLOSED state.

The second reason has to do with connection “incarnation” and message vagation, in order to allow duplicate segments of old connections to disappear naturally in the network.

We know that in the network, it often happens that the packet takes a period of time to reach the destination. The reasons are various, such as router restart, sudden link failure, etc. If when a stray packet arrives, it is found that the connection represented by the TCP connection quadruple (source IP, source port, destination IP, destination port) no longer exists, then it is very simple and the packet will be naturally discarded.

We consider a scenario where after the original connection is interrupted, an “incarnation” of the original connection is re-created. It is said to be an incarnation because this connection is exactly the same as the original connection quadruple. If the lost message arrives after a period of time, , then this message will be mistaken as a TCP segment of the connection “incarnation”, which will have an impact on TCP communication.

Therefore, TCP has designed such a mechanism. After 2MSL, it is enough for the packets in both directions to be discarded, so that the originally connected packets will naturally disappear in the network, and the reappearing packets must be the new incarnations. produced.

To highlight the point, the time of 2MSL starts when host 1 sends ACK after receiving FIN; if within the TIME_WAIT time, because host 1’s ACK is not transmitted to host 2, host 1 receives the ACK from host 1 again. 2 retransmitted FIN message, then the 2MSL time will be re-timed. The reason is very simple, because the purpose of 2MSL time is to allow all the messages of the old connection to die naturally. Now that host 1 has resent the ACK message, it naturally needs to be retimed to prevent this ACK message from affecting new possible connections. The incarnation causes disruption.

Dangers of TIME_WAIT

There are two main dangers of too much TIME_WAIT.

The first is memory resource usage. This does not seem to be too serious at present and can basically be ignored.

The second is the occupation of port resources. A TCP connection consumes at least one local port. You must know that port resources are also limited. Generally, the ports that can be opened are 32768~61000, which can also be specified through net.ipv4.ip_local_port_range. If there are too many TIME_WAIT states, new connections cannot be created. This is also the example we talked about at the beginning.

How to optimize TIME_WAIT?

In the case of high concurrency, what should we do if we want to make some optimizations to TIME_WAIT to solve the example we mentioned at the beginning?

net.ipv4.tcp_max_tw_buckets

A brute force method is to use the sysctl command to reduce the system value. This value defaults to 18000. Once the TIME_WAIT connections in the system exceed this value, the system will reset the status of all TIME_WAIT connections and only print out warning messages. This method is too violent, treats the symptoms but not the root cause, and causes far more problems than it solves. It is not recommended.

Lower TCP_TIMEWAIT_LEN and recompile the system

This method is a good one, but the disadvantage is that it requires “a little” knowledge of the kernel and the ability to recompile the kernel. I think this is not acceptable to most people.

Settings for SO_LINGER

The English word “linger” means to stay. We can set the behavior when calling close or shutdown to close the connection by setting the socket options.

int setsockopt(int sockfd, int level, int optname, const void *optval,
　　　　socklen_t optlen);

struct linger {
int l_onoff; /* 0=off, nonzero=on */
int l_linger; /* linger time, POSIX specifies units as seconds */
}

There are several possibilities for setting the linger parameter:

If l_onoff is 0, this option is turned off. The value of l_linger is ignored, which corresponds to the default behavior, close or shutdown returns immediately. If there is data left in the socket send buffer, the system will try to send the data out.

If l_onoff is non-0 and the l_linger value is also 0, then after calling close, a RST flag will be sent to the peer immediately. The TCP connection will skip four waves, thus skipping the TIME_WAIT state and closing directly. This method of closing is called “forced closing”. In this case, the queued data will not be sent, and the passive shutdown party will not know that the peer has been completely disconnected. Only when the passive shutdown Founder is blocked on the recv() call, you will immediately get a “connet reset by peer” exception when receiving the RST.

struct linger so_linger;
so_linger.l_onoff = 1;
so_linger.l_linger = 0;
setsockopt(s,SOL_SOCKET,SO_LINGER, & amp;so_linger,sizeof(so_linger));

If l_onoff is non-0 and the value of l_linger is also non-0, then after calling close, the thread calling close will block until the data is sent or the set l_linger timer expires.

The second possibility provides a possibility to cross the TIME_WAIT state, but it is a very dangerous behavior and not worth promoting.

net.ipv4.tcp_tw_reuse: more secure settings

So does Linux offer a safer alternative?

Of course there is. This is the net.ipv4.tcp_tw_reuse option.

The Linux system’s explanation of net.ipv4.tcp_tw_reuse is as follows:

Allow to reuse TIME-WAIT sockets for new connections when it is safe from protocol viewpoint. Default value is 0.It should not be changed without advice/request of technical experts.

The general idea of this paragraph is that if it is safe and controllable from a protocol perspective, the socket in TIME_WAIT can be reused for new connections.

So what is security and controllability from a protocol perspective? There are two main points:

Applies only to the connection initiator (client in C/S model);

The corresponding connection in the TIME_WAIT state can be reused only if the creation time exceeds 1 second.

There is also a prerequisite for using this option, which is to enable support for TCP timestamps, that is, net.ipv4.tcp_timestamps=1 (the default is 1).

You must know that the TCP protocol is also advancing with the times. The TCP extension specification is implemented in RFC 1323 to ensure the high availability of TCP, and a new TCP option is introduced, two 4-byte timestamp fields, used to record TCP The sender’s current timestamp and the latest timestamp received from the peer. Due to the introduction of timestamps, the 2MSL problem we mentioned earlier no longer exists, because duplicate data packets will be naturally discarded because the timestamp expires.

Summary

In today’s content, I talked about the four waves of TCP, focusing on the generation, function and optimization of TIME_WAIT. You need to remember the following three points:

TIME_WAIT is introduced to allow TCP messages to disappear naturally, and to allow the passive shutdown party to shut down normally;

Do not attempt to use SO_LINGER to set socket options, skip TIME_WAIT;

Modern Linux systems have introduced a more secure and controllable solution that can help us reuse connections in the TIME_WAIT state as much as possible.