This article focuses on the states of a TCP connection once one of the endpoints decides to terminate the connection. This so-called teardown phase involves the exchange of numerous messages (for reasons we will explore), and the TCP connection itself transitions through several states. Web developers often have only an overly simplistic understanding of these states, which may suffice when the network behaves reliably. However, a deeper understanding of TCP states is essential to design web applications that robustly manage TCP connections even in the presence of network faults during the teardown phase, and debug poorly design applications that exhibit poor resource utilization and poor performance in such situations. As always, we will explore these issues through a series of experiential learning exercises.
In the last article [8], we discussed the three phases of TCP [1] communication: (1) connection setup, (2) data transfer, and (3) teardown. For each TCP connection, the two machines communicating via this connection independently maintain information about the present state of this connection from their own perspective (one of the 11 possible states listed in Table 2). The last column of this table lists the states of a TCP connection after one of the machines decides to close the connection by initiating connection termination. In case of any abnormal network behaviour, an understanding of these states of the TCP connection is critical to accurately diagnose problems and resolve them.
For experiential understanding of these TCP states and the Transport layer [2], we need a basic setup consisting of two machines $H_{1}$ and $H_{2}$ connected via a network as shown in Figure 1. We will make use of i) a network utility called netcat (nc) [5][6] that can be used to create both a TCP client and a TCP server (for the latter, run with the option –1), ii) a utility to implement firewalls functionality called iptables [7], and iii) some custom applications (simple Python programs) to depict the typical application behaviour.
As extensively discussed in [8], the two machines exchange messages during connection setup (Phase 1) and data is transferred during Phase 2 when both machines “agree” that the TCP connection state is ESTABLISHED.
Initial (no connection) | Phase 1 Connection Setup | Phase 2 Data Transfer | Phase 3 Connection Tear Down |
CLOSED | LISTEN | ESTABLISHED | FIN_WAIT_1 |
SYN_SENT | FIN_WAIT_2 | ||
SYN_RECV | CLOSING | ||
TIME_WAIT | |||
CLOSE_WAIT | |||
LAST ACK |
At some point during Phase 2, one of the applications will decide that it does not need to transmit data any more and will initiate connection closure (i.e., the TCP connection will enter the teardown phase, Phase 3). This application (which can be either the client or the server application) will follow the Active Close path in Figure 2 (shown on the left in purple colour), and the other application will follow the Passive Close path (shown on the right in orange colour). Each sent message is identified by a number in parentheses (n), and when this is received it is identified as (n′). It should be noted that even though connection initiation (connection setup in Phase 1) is always by a client, connection termination can be initiated by either the client or the server (or by both at the same time – a very rare case that is generally not seen in real-life applications).
To understand why so many states are necessary in Phase 3, note that data transfer may continue for a long time after connection closure is initiated. As a concrete example, consider a typical case of web communication where a web client is downloading a large file (e.g., an image or a document). After the client sends the URL details (which may correspond to only a few hundred bytes, including HTTP headers), it need not send any further data and it may decide to initiate TCP connection closure (to better manage client-side Operating System resources). Despite this, the application should continue to receive data until the server has sent the full file. The connection can only be closed after the server has sent the last byte (which could be for several minutes or even hours after the client’s initial request for closure) and has received the corresponding TCP-level acknowledgement (in accordance with TCP reliable delivery). Keeping in mind that some messages can be lost due to network faults, TCP states must maintain enough information to properly manage the connection. For ease of understanding, let us first consider the simpler (and more common) scenario when the network functions correctly.
Phase 3 starts when either the client or the server application invokes the connection close API. For clarity, we will assume that the application on $H_{1}$ initiates the connection close and $H_{2}$ responds to this connection close request. Thus, $H_{1}$ transmits a TCP FIN(ish) message (1) to $H_{2}$ and must wait for a response (typically an acknowledgement). Thus, the TCP connection on $H_{1}$ enters the state FIN_WAIT1 which indicates that it has initiated connection closure by transmitting the FIN message and is waiting for a response. When $H_{2}$ receives this FIN message (1′), it responds with an ACK message (2) and the TCP connection on $H_{2}$ enters the CLOSE_WAIT state. This state indicates that the other side has initiated connection closure and the same has been acknowledged by, but the connection should not be closed yet because the application running on $H_{2}$ may wish to transmit more data. When $H_{1}$ receives this ACK message (2′), the TCP connection on $H_{1}$ enters the state FIN_WAIT2.
As long as the application running on H2H2 has more data to send, the data will be sent over the connection (5) and it is received (5′) and acknowledged by H1H1 (6). This acknowledgement will be received by H2H2 (6′). As soon as the application running on H2H2 determines that no more data is to be sent, it sends a FIN message (3). When this is received by H1H1 (3′), the TCP connection state on H1H1 moves from FIN_WAIT2 to TIME_WAIT and an ACK (4) is transmitted. Note that the TCP connection state on H1H1 does not move directly to the CLOSED state because, as discussed in detail in [8], it is necessary to ensure that no new connection with same TCP tuple (i.e. source IP address, source port number, destination IP address, destination port number) is used until all packets corresponding to an earlier such connection could be lingering in the network. This waiting time is known as 2MSL (twice the maximum segment lifetime) and is typically 120s. Another reason is that if there is no TIME_WAIT state (i.e., if H1H1 directly enters the CLOSED state) and if the ACK (4) message gets lost, then H2H2 will remain in the LAST_ACK state until timeout occurs. When H2H2 retransmits the FIN message again, there may be no connection state on H1H1, and thus H1H1 will respond with TCP Reset (indicating an error condition), which leads to ungraceful connection termination [8].
Let us consider the typical case of TCP connection teardown, which occurs when there is no network abnormality (as is usually the case). Suppose the application running on H2H2 invokes the close() call. This application will follow the TCP state transitions corresponding to active close (left side of Figure 2) whereas the application on H1H1 will follow the TCP state transitions corresponding to passive close (right side of Figure 2). Thus, H2H2 will send the FIN (1) message and move its TCP connection state to FIN_WAIT1. When H1H1 receives the FIN (1′) message, it sends the ACK (2) message and moves its TCP connection state to CLOSE_WAIT. When this ACK (2′) message is received by H2H2, it moves to the FIN_WAIT2 state. At this point, let us suppose that the application on H1H1 has no further need for the connection. Thus, immediately after sending the ACK (2) message, it sends its own FIN (3) message and moves to the LAST_ACK state. When H2H2 receives the FIN (3′) message, it responds with ACK (4) and moves the state from FIN_WAIT2 to TIME_WAIT. When H1H1 receives the ACK (4′) message, its releases all the resources associated with this connection and completely closes this connection. This is depicted by the CLOSED state, which essentially implies that no connection exists. However, the application on H2H2 remains in the TIME_WAIT state for 2MSL (two times Max Segment Lifetime), as discussed in detail in [8] and corresponding program details in [9].
In the screenshots below, the names of the machines H1H1 and H2H2 are part of the command prompts and are preceded by the date/time to show the order in which commands were issued. Command windows are also split into an upper and a lower panel to improve clarity. The upper panel in each figure shows the command to invoke the application, initiate data transfer, and connection termination whereas the lower panel is used to display the TCP connection state and any firewall rule(s) that need to be configured. We assume that all experiments begin with no firewall rules set. To ensure this, we recommend issuing the command sudo iptables -F on the machines before each experiment.
Experiment 1:
TCP State transition for normal TCP application closure.
The following experiment demonstrates this typical teardown scenario in Figure 3 and Figure 4. In Figure 3, the application on H2H2 moves the TCP connection state from LISTEN to ESTABLISHED and then, via intermediate transient states (as discussed above), to TIME_WAIT as shown by the last line of the lower panel of this figure.
The states followed by the application on H1H1 similarly shown in Figure 4. Thus, from the ESTABLISHED state, the TCP connection follows intermediate transient states of CLOSE_WAIT and LAST_ACK before completely releasing all the associated resource.
Under normal circumstances, the TCP connection closure follows the expected path and there is little reason for developers to have this detailed understanding of state changes. However, in some critical service function applications, the branding image and possibly the revenue of the service provider could be impacted if users perceive unsuccessful or delayed connection. For such applications, it is imperative that developers understand the details of various TCP termination states and the general conditions that lead to these states. We will now consider the situation where network failures during the teardown phase cause problems that can be diagnosed and rectified using our detailed understanding.
Experiment 2:
TCP State FIN_WAIT1 (H1H1) and ESTABLISHED (H2H2)
In the absence of network faults, when H1H1 sends a FIN message and moves to state FIN_WAIT1, it can expect to get an ACK almost immediately (i.e., within a few milliseconds) and thus move to state FIN_WAIT2. Thus, to observe H1H1 in the transient state FIN_WAIT1, we need to simulate a network fault. As in our previous article [8], we will make use of the iptables utility to create abnormal network conditions.
In Figure 2, observe that H1H1’s TCP connection state will remain in FIN_WAIT1 if it sends a FIN message but does not receive an ACK. This can occur either because the FIN message was lost in the network or the FIN was received but the ACK did not reach the application initiating connection close. The upper panel in Figure 5 shows an application on machine H2H2 starting to run on TCP port 7777 at time 16:28:53, and the first line in the lower panel shows that the TCP connection state is LISTEN at time 16:28:57 (as described in detail in [8]). The application on machine H1H1 connects to the application on machine H2H2 at 16:30:22, as shown in the upper panel of Figure 6. The TCP connection state becomes ESTABLISHED, as shown by the output of the first command (issued at 16:30:41). Next, the text message “Hello” is exchanged, as shown in the upper panels of both these figures. To mimic the scenario where the FIN message is lost in the network, an iptables command is issued on H1H1 to drop all FIN packets sent from H1H1 with the destination port 7777 (shown by the third command in the lower panel of Figure 6). Now the application on H1H1 is terminated by pressing Ctrl-C, as shown by the fourth line in the upper panel of Figure 6. When the TCP state is subsequently checked (fourth command in the lower panel of Figure 6), it is FIN_WAIT1 at time 16:31:48. Since H2H2 does not receive the FIN message (because of the iptables command on H1H1), the TCP connection remains in the ESTABLISHED state, as shown by the last command in the lower panel of Figure 5 at time 16:31:55.