Debugging "clogged" TCP connections
Posted
by
Nikratio
on Super User
See other posts from Super User
or by Nikratio
Published on 2012-12-14T03:42:26Z
Indexed on
2012/12/14
5:06 UTC
Read the original article
Hit count: 403
linux
|networking
I'm having trouble with an internet connection that seems to randomly "freeze" arbitrary tcp connections. The connections stay established, but no data is coming through.
When this happens, netstat still shows the connection status as ESTABLISHED
on both the local computer:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name Timer
tcp 0 53 192.168.0.10:41129 173.255.235.238:143 ESTABLISHED 8219/gnutls-cli on (79.31/13/0)
..and the remote server:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name Timer
tcp 0 0 173.255.235.238:143 68.5.174.98:41129 ESTABLISHED 5303/imapd off (0.00/0/0)
However, it seems that no data at all is transferred. If I run strace on the local and remote process, both just show a repeating sequence of select calls (with different fds of course), e.g.
select(6, [0 5], NULL, NULL, {0, 50000}) = 0 (Timeout)
select(6, [0 5], NULL, NULL, {0, 50000}) = 0 (Timeout)
select(6, [0 5], NULL, NULL, {0, 50000}) = 0 (Timeout)
The internet connection overall does not seem affected, I can still establish new connections to the same service on the same server without any problems. However, the affected local applications seem to be unaware of the problem and just hang.
When I look at a packet capture of this connection on the client side, the last thing that happens is that the client transmits some data, then nothing happens for about 1100 seconds, and then several TCP Retransmission requests go out, with intervals increasing from 4 seconds to 130 seconds. No activity is captured after that.
After about 10 minutes, the connection on the remote end disappears from the netstat (I wasn't able to catch any intermediate state), but still stays ESTABLISHED
on the local end.
Finally, after some more minutes, the local application aborts with a timeout and disappears from the local netstat output as well.
Does anyone have a suggestion of how I could debug this further to find out where the problem lies and how to fix it?
Additionaly and/or as a temporary workaround: is is there some way to globally reduce the timeout on client and/or server to reduce the time before the local application aborts?
© Super User or respective owner