I am running Linux kernel 3.13 (Ubuntu 14.04) on two Virtual Machines each of which operates inside two different servers running ESXi 5.1. There is a zeromq client-server application running between the two VMs. After running for about 10-30 minutes, this application consistently hangs due to inability to retransmit a lost packet.
When I run the same setup over Ubuntu 12.04 (Linux 3.11), the application never fails
If you notice below, "ss" (socket statistics) shows 1 packet lost, sk_wmem_queued of 14110 (i.e. w14110) and a high rto (120000).
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 12350 192.168.2.122:41808 192.168.2.172:55550
timer:(on,16sec,10) uid:1000 ino:35042
sk:ffff880035bcb100 <-
skmem:(r0,rb648720,t0,tb1164800,f2274,w14110,o0,bl0) ts sack cubic wscale:7,7 rto:120000 rtt:7.5/3 ato:40 mss:8948 cwnd:1 ssthresh:21 send 9.5Mbps unacked:1 retrans:1/10 lost:1 rcv_rtt:1476 rcv_space:37621
Since this has happened so consistently, I was able to capture the TCP log in wireshark. I found that the packet which is lost does get retransmitted and even acknowledged by the TCP in the other OS (the sequence number is seen in the ACK), but the sender doesn't seem to understand this ACK and continues retransmitting.
MTU is 9000 on both virtual machines and througout the route. The packets being sent are large in size.
As I said earlier, this does not happen on Ubuntu 12.04 (kernel 3.11). So I did a diff on the TCP config options (seen via "sysctl -a |grep tcp ") between 14.04 and 12.04 and found the following differences.
I also noticed that net.ipv4.tcp_mtu_probing=0 in both configurations.
Left side is 3.11, right side is 3.13
<<net.ipv4.tcp_abc = 0
<<net.ipv4.tcp_cookie_size = 0
<<net.ipv4.tcp_dma_copybreak = 4096
14c11
<< net.ipv4.tcp_early_retrans = 2
---
>> net.ipv4.tcp_early_retrans = 3
17c14
<< net.ipv4.tcp_fastopen = 0
>> net.ipv4.tcp_fastopen = 1
20d16
<< net.ipv4.tcp_frto_response = 0
26,27c22
<< net.ipv4.tcp_max_orphans = 16384
<< net.ipv4.tcp_max_ssthresh = 0
>> net.ipv4.tcp_max_orphans = 4096
29,30c24,25
<< net.ipv4.tcp_max_tw_buckets = 16384
<< net.ipv4.tcp_mem = 94377 125837 188754
>> net.ipv4.tcp_max_tw_buckets = 4096
>> net.ipv4.tcp_mem = 23352 31138 46704
34a30
>> net.ipv4.tcp_notsent_lowat = -1
My question to the networking experts on this forum : Are there any other debugging tools or options I can install/enable to dig further into why this TCP retransmit failure is occurring so consistently ? Are there any configuration changes which might account for this weird behaviour.