Remember a weird TCP hand-waving disorder problem

The content of this article includes but is not limited to: tcp four waves (closed at the same time), seq/ack number rules of tcp packets, tcp state machine, kernel tcp code, tcp sending window and other knowledge.

What is the problem?

Kernel version linux 5.10.112

In a sentence: During the four waves, due to the fin packet and ack packet being out of order, it took a timeout to close the connection.

Process details:

  • In the scenario of simultaneous shutdown, the server and client send fin packets to each other almost at the same time.
  • The client first receives the fin packet from the server and sends back the ack packet.
  • However, an out-of-order situation occurred at the server. The client’s ack packet was received first, and then the fin packet was received.
  • The result is that the server failed to correctly process the client’s fin packet and failed to return the correct ack packet.
  • The client did not receive the ack packet (for fin), so it waited for the timeout and retransmitted the fin packet, and then returned to the normal process of closing the connection.

Detailed analysis of problem packet capture

The upper part of the picture is the client, and the lower part is the server.

Focus on the four packets with IDs 14913, 14914, 20622, 20623. In order to facilitate analysis later, the last four digits of the seq and ack numbers are taken:

  • 20622 (seq=4416, ack=753), fin packet sent by client: client actively closes the connection and sends fin packet to server;
  • 14913 (seq=753, ack=4416), fin packet sent by the server: the server actively closes the connection and sends the fin packet to the client;
  • 20623 (seq=4417, ack=754), ack packet responded by the client: the client receives the fin from the server and responds with an ack packet;
  • 14914 (seq=754, ack=4416), ack packet sent by the server;

The problem occurs at the server (red box location), after sending 14913:

  • Received 20623 (seq=4417) first, but the seq expected to be received at this time is 4416, so it is marked as [previous segment not captured]
  • Then I received 20622 and sent back an ack packet with the ID of 14914. The problem arises here: The ack of this packet is 4416, which means The server is still waiting for the data packet with seq=4416. In other words, fin-20622 has not been actually received by the server.
  • The client found that 20622 was not received correctly, so after waiting for timeout, it resent the fin packet (id=20624), and then the connection was closed normally.

(I would like to emphasize ack-20623 and fin-20622 again. These two packages will be mentioned frequently later)

First of all, this phenomenon is very unreasonable intuitively. TCP should have an appropriate mechanism to ensure out-of-order recovery. Here, both 20622 and 20623 have arrived at the server. Although the order is out of order, it should not affect the server to receive both. This is the main question.

After preliminary analysis, we speculate that the most likely reason is that 20622 is ignored by the server’s kernel (the reason is currently unknown). Since it is the behavior of the kernel, let’s first try to reproduce the problem in the local environment. However, it was not successful.

New problem: Attempt to reproduce failed

In order to simulate the above out-of-order scenario, we use two ecs to forge tcp packets on the client and communicate with the normal socket on the server.

The packet capture results at the server are as follows:

Pay attention to the packages with No. 5, 6, 7, and 8:

  • 5: The server sends fin to the client (I don’t know why there is a retransmission here, but it does not affect the subsequent effect, so I have not gone into details)
  • 6: The client first sent back the ack packet with seq=1002
  • 7: The client returned the fin packet with seq=1001
  • 8: The server sent back an ack packet with ack=1002.ack=1002 means that the client’s fin packet was received normally! (If in the problem scenario, the ack packet returned at this time, ack should be 1001)

Later, in order to keep the kernel version consistent, the same program was moved to the local virtual machine to run, and the same results were obtained. In other words, the replication failed.

Attachment: Simulation program code

Tools: python + scapy

Here, scapy is used to forge the client and send out-of-order ack and fin in order to observe the ack packet returned by the server. Because the client has not actually adopted the TCP protocol, no matter whether the recurrence is successful or not, no timeout retransmission can be observed.

(1) Normal socket monitoring at the server:

import socket

server_ip = "0.0.0.0"
server_port = 12346

server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.bind((server_ip, server_port))
server_socket.listen(5)

connection, client_address = server_socket.accept()

connection.close() #Send fin
server_socket.close()

(2) Client simulates disorder:

from scapy.all import *
import time
importsys

target_ip = "omitted"
target_port = 12346
src_port = 1234

#Forge data packets and establish tcp connections
ip = IP(dst=target_ip)
syn = TCP(sport=src_port, dport=target_port, flags="S", seq=1000)
syn_ack = sr1(ip/syn)
if syn_ack and TCP in syn_ack and syn_ack[TCP].flags == "SA":
    print("Received SYN-ACK")
    ack = TCP(sport=src_port, dport=target_port,
              flags="A", seq=syn_ack.ack, ack=syn_ack.seq + 1)
    send(ip/ack)
    print("Sent ACK")
else:
    print("Failed to establish TCP connection")
  
def handle_packet(packet):
    if TCP in packet and packet[TCP].flags & 0x01:
        print("Received FIN packet") #If fin is received from the server, send ack first and then fin.
        ack = TCP(sport=src_port, dport=target_port,
                  flags="A", seq=packet.ack + 1, ack=packet.seq + 1)
        send(ip/ack)
      
        time.sleep(0.1)
        fin = TCP(sport=src_port, dport=target_port,
                  flags="FA", seq=packet.ack, ack=packet.seq)
        send(ip/fin)
        sys.exit(0)
sniff(prn=handle_packet)

Where does the problem occur?

There was an out-of-order situation at the server. As a result, the connection failed to close normally. Instead, the connection was closed only after the client timed out and retransmitted the fin packet.

What impact does the problem have?

The time it takes to close the connection at the server becomes longer (an additional 200ms), which has a significant impact on latency-sensitive scenarios.

What problem is this article trying to solve?

  • Is this behavior a legitimate behavior of the kernel? (Spoiler first, this is legal behavior)
  • Why does local replication fail?

Troubleshooting

After about 6 weekends of intermittent [code-testing] cycles, I finally found the problem! Below is a brief description of the troubleshooting process, including some of our failed attempts.

Preliminary analysis

Returning to the above question, not only is the cause of the problem unclear, but the local reproduction perfectly matches the ideal situation. simply put:

  • Local reproduction – out of order does not affect waving;
  • Problem scenario – out of order leading to timeout and retransmission.

It can be determined that the problem is likely to occur in the server’s processing of ack-20623 and fin-20622.

(In the following, ack-20623 and fin-20622 will be used to refer to the out-of-order ack and fin packages)

The key is: after the server sends fin (enters FIN_WAIT_1 state), how to handle the out-of-order ack-20623 and fin-20622 received later. This involves the state transfer of tcp, so the first issue is to determine the state transfer process. After that, the corresponding code fragment can be locked according to the state transfer and detailed analysis can be done.

Confirm state transition

Since the problem occurs during the waving process, it is natural to think ofjudging the reception/processing of data packets by observing state transitions.

We combine the recurrence process and use ss and ebpf to monitor the status changes of tcp. It is determined that after receiving ack-20623, the server entered the FIN_WAIT_2 state from FIN_WAIT_1, which means that ack-20623 was processed correctly. Then the problem is likely to occur in the processing of fin-20622, which also confirms our initial guess.

There is another strange point here: according to the correct waving process, the server should enter the TIMEWAIT state after receiving fin in FIN_WAIT_2. We observed this state transition in ss, but when using ebpf monitoring, we did not capture this state transition. We did not pay attention to this issue at the time, but later we learned the reason: in the ebpf implementation, only the state transfer caused by tcp_set_state() is recorded. Although it has entered the TIMEWAIT state, it has not passed tcp_set_state(), so it cannot be seen in ebpf. For how to enter TIMEWAIT here, please see the “Extra” section at the end.

Attached: ebpf monitoring results

(When FIN_WAIT1 is transferred to FIN_WAIT2, snd_una is updated, making sure that ack-20623 is processed correctly)

<idle>-0 [000] d.s. 42261.233642: PASSIVE_ESTABLISHED: start monitor tcp state change
<idle>-0 [000] d.s. 42261.233651: port:12346,snd_nxt:154527568,snd_una:154527568
<idle>-0 [000] d.s. 42261.233652: rcv_nxt:1001,recved:0,acked:0

<...>-9451 [007] d... 42261.233808: changing from ESTABLISHED to FIN_WAIT1
<...>-9451 [007] d... 42261.233815: port:12346,snd_nxt:154527568,snd_una:154527568
<...>-9451 [007] d... 42261.233816: rcv_nxt:1001,recved:0,acked:0

<idle>-0 [000] dNs. 42261.464578: changing from FIN_WAIT1 to FIN_WAIT2
<idle>-0 [000] dNs. 42261.464588: port:12346,snd_nxt:154527569,snd_una:154527569
<idle>-0 [000] dNs. 42261.464589: rcv_nxt:1001,recved:0,acked:1

Kernel source code analysis

At this point, we have to take a look at the kernel source code. Combined with the above analysis, the problem is likely to occur in the tcp_rcv_state_process() function, and the fragment about TCP_FIN_WAIT2 was extracted. Unfortunately, no doubts were found in this fragment:

(tcp_rcv_state_process is a function that handles state transfer when receiving data packets, located in net/ipv4/tcp_input.c)

case TCP_FIN_WAIT1:
  case TCP_FIN_WAIT2:
    /* RFC 793 says to queue data in these states,
     * RFC 1122 says we MUST send a reset.
     *BSD 4.4 also does reset.
     */
    if (sk->sk_shutdown & amp; RCV_SHUTDOWN) {
      if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq & amp; & amp;
          after(TCP_SKB_CB(skb)->end_seq - th->fin, tp->rcv_nxt)) { //After analysis, this condition is not met
        NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONDATA);
        tcp_reset(sk);
        return 1;
      }
    }
    fallthrough;
  case TCP_ESTABLISHED:
    tcp_data_queue(sk, skb); //If you enter this function, the disorder will be corrected, and the processing of fin is also in this function
    queued = 1;
    break;

If the run reaches this point, it is basically certain that fin will be processed normally, so we use this position as the end point of our inspection. In other words, the out-of-orderfin-20622 should not have successfully reached here. We started from this location, Look forward, and found a very suspicious location, also in tcp_rcv_state_process.

//Check whether the ack value is legal
  acceptable = tcp_ack(sk, skb, FLAG_SLOWPATH |
              FLAG_UPDATE_TS_RECENT |
              FLAG_NO_CHALLENGE_ACK) > 0;

  if (!acceptable) { //If it is not legal
    if (sk->sk_state == TCP_SYN_RECV) //This branch will not be entered during waving
      return 1; /* send one RST */
    tcp_send_challenge_ack(sk, skb); //Return an ack and discard it
    goto discard;
  }

If the ack check for fin-20622 here fails, an ack (ie package 14914, challenge ack in this code) will also be sent, and then discarded (without entering the process of processing fin). This is very consistent with the problem scenario. Continuing to analyze the tcp_ack() function, we also found points that may be considered illegal:

/*This paragraph is to determine the relationship between the received ack value and the local sending window.
  Here snd_una means send un-acknowledge, that is, the position where it was sent but not acked.
*/
    if (before(ack, prior_snd_una)) { //If the ack value received has been acked by the previous packet
    /* RFC 5961 5.2 [Blind Data Injection Attack].[Mitigation] */
···
    goto old_ack;
  }
···
old_ack:
  /* If data was SACKed, tag it and see if we should send more data.
   * If data was DSACKed, see if we can undo a cwnd reduction.
   */
···

  return 0;

To summarize: fin-20622 has a possible processing path that matches the performance of the problem scenario. From the server’s perspective:

  • First, ack-20623 is received, and the value of snd_una is updated to the ack value of the packet, which is 754.
  • Then fin-20622 is received. At the stage of checking the ack value, since the ack of the packet is 753, which is smaller than the snd_nxt at this time, it is judged to be old_ack, which is illegal. Afterwards, the return value of acceptable is 0.
  • Since the ack value is determined to be illegal, the kernel returns a challenge ack packet and then directly discards fin-20622.
  • Therefore, fin-20622 was finally discarded by tcp_rcv_state_process and did not enter the fin packet processing process.

In this way, it is equivalent to the server not receiving the fin signal, which is consistent with the problem scenario.

After finding this suspicious path, we need to find a way to verify it.

Since it is accurate to specific code fragments and the actual code is quite complex, it is difficult to determine the real running path through code analysis alone.

So we launched a big move, directly modifying the kernel and verifying the TCP status information at the above points, mainly the status transfer and sending window.

Modify the kernel to match the test

I won’t go into details about the specific process, we have made new discoveries: (Tip: We are still using the “normal” reproduction script)

  1. When ack-20623 is received, snd_una is indeed updated, which is in line with the above assumption and provides conditions for the fin packet to be discarded.
  2. The out-of-order fin packet does not enter the tcp_rcv_state_process() function at all, but is processed directly by the outer tcp_v4_rcv() function according to the TIMEWAIT process, and the connection is finally closed.

Obviously, the second point is likely to be the key to the failure of reproduction.

  • This further proves our previous hypothesis. If fin can enter the tcp_rcv_state_process() function, the problem should be reproduced. However, there may be some configuration differences between the online scenario and the replay scenario, resulting in different code paths.
  • In addition, this discovery also subverts our understanding. According to the waving process of TCP, before receiving fin-20622, the server receives ack after sending fin, so it should be in FIN_WAIT_2 state. The same is true for the tool monitoring results. Why is TIMEWAIT here? .

With these questions, we return to the code and continue the analysis. Between ack checking and fin processing, find the most suspicious location:

case TCP_FIN_WAIT1: {
    int tmo;
···

    if (tp->snd_una != tp->write_seq) //An abnormal situation, there is still data to be sent
      break; //suspicious

    tcp_set_state(sk, TCP_FIN_WAIT2); //Transfer to FIN_WAIT2 and close the sending direction
    sk->sk_shutdown |= SEND_SHUTDOWN;

    sk_dst_confirm(sk);

    if (!sock_flag(sk, SOCK_DEAD)) { //Delayed shutdown
      /* Wake up lingering close() */
      sk->sk_state_change(sk);
      break; //suspicious
    }
···
        //May enter timewait related logic
    tmo = tcp_fin_time(sk); //Calculate fin timeout
    if (tmo > sock_net(sk)->ipv4.sysctl_tcp_tw_timeout) {
            //If the timeout is very large, start the keepalive timer to detect activity.
      inet_csk_reset_keepalive_timer(sk,
                   tmo - sock_net(sk)->ipv4.sysctl_tcp_tw_timeout);
    } else if (th->fin || sock_owned_by_user(sk)) {
      /* Bad case. We could lose such FIN otherwise.
       * It is not a big problem, but it looks confusing
       * and not so rare event. We still can lose it now,
       * if it spins in bh_lock_sock(), but it is really
       * marginal case.
       */
      inet_csk_reset_keepalive_timer(sk, tmo);
    } else { // Otherwise, enter timewait directly; after testing, the ack package entered this branch when the recurrence failed.
      tcp_time_wait(sk, TCP_FIN_WAIT2, tmo);
      goto discard;
    }
    break;
  }

This fragment corresponds to the processing of ack-20623, and is indeed related to TIMEWAIT, so we suspect the previous two breaks. If break is triggered in advance, will it not cause TIMEWAIT and be able to reproduce successfully?

Without further ado, let’s get started. By modifying the code, I found that triggering any one of the two breaks can reproduce the problem scenario, causing the connection to fail to close normally!

Comparing the two break conditions, SOCK_DEAD becomes the biggest suspect.

About SOCK_DEAD

Judging from the literal meaning, this flag should be related to the closing process of tcp. Searching in the kernel code, we found two related functions:

/*
 * Shutdown the sending side of a connection. Much like close except
 * that we don't receive shut down or sock_set_flag(sk, SOCK_DEAD).
 */

void tcp_shutdown(struct sock *sk, int how)
{
  /* We need to grab some memory, and put together a FIN,
   * and then put it into the queue to be sent.
   * Tim MacKenzie([email protected]) 4 Dec '92.
   */
  if (!(how & amp; SEND_SHUTDOWN))
    return;

  /* If we've already sent a FIN, or it's a closed state, skip this. */
  if ((1 << sk->sk_state) & amp;
      (TCPF_ESTABLISHED | TCPF_SYN_SENT |
       TCPF_SYN_RECV | TCPF_CLOSE_WAIT)) {
    /* Clear out any half completed packets. FIN if needed. */
    if (tcp_close_state(sk))
      tcp_send_fin(sk);
  }
}
EXPORT_SYMBOL(tcp_shutdown);

As can be seen from the comments, this function has part of the functions of close, but it does not sock_set_flag(sk, SOCK_DEAD). Then take another look at tcp_close():

void tcp_close(struct sock *sk, long timeout)
{
  struct sk_buff *skb;
  int data_was_unread = 0;
  int state;

···
  if (unlikely(tcp_sk(sk)->repair)) {
    sk->sk_prot->disconnect(sk, 0);
  } else if (data_was_unread) {
    /* Unread data was tossed, zap the connection. */
    NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONCLOSE);
    tcp_set_state(sk, TCP_CLOSE);
    tcp_send_active_reset(sk, sk->sk_allocation);
  } else if (sock_flag(sk, SOCK_LINGER) & amp; & amp; !sk->sk_lingertime) {
    /* Check zero linger _after_ checking for unread data. */
    sk->sk_prot->disconnect(sk, 0);
    NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONDATA);
  } else if (tcp_close_state(sk)) {
    /* We FIN if the application ate all the data before
     * zapping the connection.
     */
    tcp_send_fin(sk); //Send fin packet
  }

  sk_stream_wait_close(sk, timeout);

adjudge_to_death:
  state = sk->sk_state;
  sock_hold(sk);
  sock_orphan(sk); //SOCK_DEAD flag will be set here
···
}
EXPORT_SYMBOL(tcp_close);

Here, tcp_shutdown and tcp_close are both standard interfaces of the tcp protocol and can be used to close the connection:

struct proto tcp_prot = {
  .name = "TCP",
  .owner = THIS_MODULE,
  .close = tcp_close, //close here
  .pre_connect = tcp_v4_pre_connect,
  .connect = tcp_v4_connect,
  .disconnect = tcp_disconnect,
  .accept = inet_csk_accept,
  .ioctl = tcp_ioctl,
  .init = tcp_v4_init_sock,
  .destroy = tcp_v4_destroy_sock,
  .shutdown = tcp_shutdown, //shutdown here
  .setsockopt = tcp_setsockopt,
  .getsockopt = tcp_getsockopt,
  .keepalive = tcp_set_keepalive,
  .recvmsg = tcp_recvmsg,
  .sendmsg = tcp_sendmsg,
···
};
EXPORT_SYMBOL(tcp_prot);

In summary, an important difference between shutdown and close is that shutdown does not set SOCK_DEAD.

We replaced close() of the reproduction script with shutdown() and tested again, and finally successfully reproduced the result of fin being discarded!

(And by printing the log, it was determined that the reason for discarding was the old_ack mentioned before, which finally verified our hypothesis.)

Now you just need to return to the online scenario and confirm whether shutdown() is actually called to close the connection. After confirmation by online classmates, the server here indeed used shutdown() to close the connection (through lingering_close of nginx).

At this point, the truth is finally revealed!

Summary

Finally, to wrap up by answering the first two questions:

  • Is this behavior a legitimate behavior of the kernel?
    • It is a legal behavior and is caused by the logic of the kernel checking ack;
    • The kernel will update the sending window parameter snd_una based on the received ack value, and use snd_una to determine whether the ack packet needs to be processed;
    • Since the ack value of fin-20622 is smaller than ack-20623, and ack-20623 arrives first, snd_una is updated. During the ack check process, the fin that arrives later is considered to have been acked when compared with snd_una and does not need to be processed anymore. The result is discarded directly and a challenge_ack is returned. leading to problem scenarios.
  • Why does local replication fail?
    • When closing the tcp connection, the close() interface is used, while the online environment uses shutdown()
    • Shutdown does not set SOCK_DEAD, but close does the opposite, causing the code path to be reproduced to be different from the problem scenario.

Extra: TCP status transfer under close()

In fact, there is still one question left:

Why is the state transition FIN_WAIT_2 -> TIMEWAIT of the fin package not observed when the connection is closed with close() (tcp_rcv_state_process is not entered)?

This starts after FIN_WAIT_1 receives the ack. As mentioned in the above code analysis, if two suspicious breaks are not triggered, the ack will be processed when:

 case TCP_FIN_WAIT1: {
    int tmo;
···
        else {
      tcp_time_wait(sk, TCP_FIN_WAIT2, tmo);
      goto discard;
    }
    break;
  }

The main logic of tcp_time_wait() is as follows:

/*
 * Move a socket to time-wait or dead fin-wait-2 state.
 */
void tcp_time_wait(struct sock *sk, int state, int timeo)
{
  const struct inet_connection_sock *icsk = inet_csk(sk);
  const struct tcp_sock *tp = tcp_sk(sk);
  struct inet_timewait_sock *tw;
  struct inet_timewait_death_row *tcp_death_row = & amp;sock_net(sk)->ipv4.tcp_death_row;

    //Create tw, where the tcp status is set to TCP_TIME_WAIT
  tw = inet_twsk_alloc(sk, tcp_death_row, state);

  if (tw) { //If created successfully, it will be initialized.
    struct tcp_timewait_sock *tcptw = tcp_twsk((struct sock *)tw);
    const int rto = (icsk->icsk_rto << 2) - (icsk->icsk_rto >> 1); //Calculate timeout time
    struct inet_sock *inet = inet_sk(sk);

    tw->tw_transparent = inet->transparent;
    tw->tw_mark = sk->sk_mark;
    tw->tw_priority = sk->sk_priority;
    tw->tw_rcv_wscale = tp->rx_opt.rcv_wscale;
    tcptw->tw_rcv_nxt = tp->rcv_nxt;
    tcptw->tw_snd_nxt = tp->snd_nxt;
    tcptw->tw_rcv_wnd = tcp_receive_window(tp);
    tcptw->tw_ts_recent = tp->rx_opt.ts_recent;
    tcptw->tw_ts_recent_stamp = tp->rx_opt.ts_recent_stamp;
    tcptw->tw_ts_offset = tp->tsoffset;
    tcptw->tw_last_oow_ack_time = 0;
    tcptw->tw_tx_delay = tp->tcp_tx_delay;

    /* Get the TIME_WAIT timeout firing. */
        //Determine the timeout
    if (timeo < rto)
      timeo = rto;

    if (state == TCP_TIME_WAIT)
      timeo = sock_net(sk)->ipv4.sysctl_tcp_tw_timeout;

    /* tw_timer is pinned, so we need to make sure BH are disabled
     * in following section, otherwise timer handler could run before
     * we complete the initialization.
     */
        //Update and maintain the structure of timewait sock
    local_bh_disable();
    inet_twsk_schedule(tw, timeo);
    /* Linkage updates.
     * Note that access to tw after this point is illegal.
     */
    inet_twsk_hashdance(tw, sk, & amp;tcp_hashinfo); //Add global hash table (tcp_hashinfo)
    local_bh_enable();
  } else {
    /* Sorry, if we're out of memory, just CLOSE this
     * socket up. We've got bigger problems than
     * non-graceful socket closings.
     */
    NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPTIMEWAITOVERFLOW);
  }

  tcp_update_metrics(sk); //Update tcp statistical indicators without affecting this behavior
  tcp_done(sk); //Destroy sk
}
EXPORT_SYMBOL(tcp_time_wait);

It can be seen that during this process, the original sk was destroyed, and the corresponding inet_timewait_sock was created and entered timing. In other words, when the close server receives the ack, although it will enter FIN_WAIT_2, it immediately switches to the TIMEWAIT state without going through the standard tcp_set_state() function, so ebpf does not monitor it.

When the fin packet is received later, it will not enter tcp_rcv_state_process() at all, but the outer tcp_v4_rcv() will perform timewait process processing. Specifically, tcp_v4_rcv() will query the corresponding kernel sk based on the received skb. The timewait_sock created above will be found here. Its status is TIMEWAIT, so it directly enters the timewait processing. The core code is as follows:

int tcp_v4_rcv(struct sk_buff *skb)
{
  struct net *net = dev_net(skb->dev);
  struct sk_buff *skb_to_free;
  int sdif = inet_sdif(skb);
  int dif = inet_iif(skb);
  const struct iphdr *iph;
  const struct tcphdr *th;
  bool refcounted;
  struct sock *sk;
  int ret;
···
    th = (const struct tcphdr *)skb->data;
···
lookup:
  sk = __inet_lookup_skb( & amp;tcp_hashinfo, skb, __tcp_hdrlen(th), th->source,
             th->dest, sdif, & refcounted); //Query sk from the global hash table tcp_hashinfo
···
process:
  if (sk->sk_state == TCP_TIME_WAIT)
    goto do_time_wait;
···
do_time_wait: //Normal timewait processing flow
···
  goto discard_it;
}

In summary, the server calls close() to close the connection. After receiving the ack, it will transfer to FIN_WAIT_2, and then immediately transfer to TIMEWAIT without waiting for the client’s fin packet.

A simple qualitative understanding: calling close() on the socket means completely closing the receiving and sending, so it does not make much sense to enter FIN_WAIT_2 to wait for the other party’s fin (one of the main purposes of waiting for the other party’s fin is to confirm that the other party has finished sending), so before confirming one’s own After the sent fin is received by the other party (the client’s ack for fin is received), it can enter the TIMEWAIT state.

author:

Alibaba Cloud Virtual Switch Team: Responsible for the development and maintenance of Alibaba Cloud network virtualization.

Alibaba Cloud Kernel Network Team: Responsible for the development and maintenance of the kernel network protocol stack in the Alibaba Cloud server operating system.

Original link

This article is original content of Alibaba Cloud and may not be reproduced without permission.

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. Network skill treeCommunication learning in cross-regional networksThe role of the network layer 42266 people are learning the system