Exploring TCP throughput with DTrace (2)
- by user12820842
Last time, I described how we can use the overlap in distributions of unacknowledged byte counts and send window to determine whether the peer's receive window may be too small, limiting throughput. Let's combine that comparison with a comparison of congestion window and slow start threshold, all on a per-port/per-client basis. This will help us
Identify whether the congestion window or the receive window are limiting factors on throughput by comparing the distributions of congestion window and send window values to the distribution of outstanding (unacked) bytes. This will allow us to get a visual sense for how often we are thwarted in our attempts to fill the pipe due to congestion control versus the peer not being able to receive any more data.
Identify whether slow start or congestion avoidance predominate by comparing the overlap in the congestion window and slow start distributions. If the slow start threshold distribution overlaps with the congestion window, we know that we have switched between slow start and congestion avoidance, possibly multiple times.
Identify whether the peer's receive window is too small by comparing the distribution of outstanding unacked bytes with the send window distribution (i.e. the peer's receive window). I discussed this here.
# dtrace -s tcp_window.d
dtrace: script 'tcp_window.d' matched 10 probes
^C
cwnd 80 10.175.96.92
value ------------- Distribution ------------- count
1024 | 0
2048 | 4
4096 | 6
8192 | 18
16384 | 36
32768 |@ 79
65536 |@ 155
131072 |@ 199
262144 |@@@ 400
524288 |@@@@@@ 798
1048576 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3848
2097152 | 0
ssthresh 80 10.175.96.92
value ------------- Distribution ------------- count
268435456 | 0
536870912 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 5543
1073741824 | 0
unacked 80 10.175.96.92
value ------------- Distribution ------------- count
-1 | 0
0 | 1
1 | 0
2 | 0
4 | 0
8 | 0
16 | 0
32 | 0
64 | 0
128 | 0
256 | 3
512 | 0
1024 | 0
2048 | 4
4096 | 9
8192 | 21
16384 | 36
32768 |@ 78
65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 5391
131072 | 0
swnd 80 10.175.96.92
value ------------- Distribution ------------- count
32768 | 0
65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 5543
131072 | 0
Here we are observing a large file transfer via http on the webserver. Comparing these distributions, we can observe:
That slow start congestion control is in operation. The distribution of congestion window values lies below the range of slow start threshold values (which are in the 536870912+ range), so the connection is in slow start mode.
Both the unacked byte count and the send window values peak in the 65536-131071 range, but the send window value distribution is narrower. This tells us that the peer TCP's receive window is not closing.
The congestion window distribution peaks in the 1048576 - 2097152 range while the receive window distribution is confined to the 65536-131071 range. Since the cwnd distribution ranges as low as 2048-4095, we can see that for some of the time we have been observing the connection, congestion control has been a limiting factor on transfer, but for the majority of the time the receive window of the peer would more likely have been the limiting factor. However, we know the window has never closed as the distribution of swnd values stays within the 65536-131071 range.
So all in all we have a connection that has been mildly constrained by congestion control, but for the bulk of the time we have been observing it neither congestion or peer receive window have limited throughput.
Here's the script:
#!/usr/sbin/dtrace -s
tcp:::send
/ (args[4]-tcp_flags & (TH_SYN|TH_RST|TH_FIN)) == 0 /
{
@cwnd["cwnd", args[4]-tcp_sport, args[2]-ip_daddr] =
quantize(args[3]-tcps_cwnd);
@ssthresh["ssthresh", args[4]-tcp_sport, args[2]-ip_daddr] =
quantize(args[3]-tcps_cwnd_ssthresh);
@unacked["unacked", args[4]-tcp_sport, args[2]-ip_daddr] =
quantize(args[3]-tcps_snxt - args[3]-tcps_suna);
@swnd["swnd", args[4]-tcp_sport, args[2]-ip_daddr] =
quantize((args[4]-tcp_window)*(1 tcps_snd_ws));
}
One surprise here is that slow start is still in operation - one would assume that for a large file transfer, acknowledgements would push the congestion window up past the slow start threshold over time. The slow start threshold is in fact still close to it's initial (very high) value, so that would suggest we have not experienced any congestion (the slow start threshold is adjusted when congestion occurs). Also, the above measurements were taken early in the connection lifetime, so the congestion window did not get a changes to get bumped up to the level of the slow start threshold.
A good strategy when examining these sorts of measurements for a given service (such as a webserver) would be start by examining the distributions above aggregated by port number only to get an overall feel for service performance, i.e. is congestion control or peer receive window size an issue, or are we unconstrained to fill the pipe? From there, the overlap of distributions will tell us whether to drill down into specific clients. For example if the send window distribution has multiple peaks, we may want to examine if particular clients show issues with their receive window.