HAProxy + NodeJS gets stuck on TCP Retransmission
- by sled
I have a HAProxy + NodeJS + Rails Setup, I use the NodeJS Server for file upload purposes.
The problem I'm facing is that if I'm uploading through haproxy to nodejs and a "TCP (Fast) Retransmission" occurs because of a lost packet the TX rate on the client drops to zero for about 5-10 secs and gets flooded with TCP Retransmissions.
This does not occur if I upload to NodeJS directly (TCP Retransmission happens too but it doesn't get stuck with dozens of retransmission attempts).
My test setup is a simple HTML4 FORM (method POST) with a single file input field.
The NodeJS Server only reads the incoming data and does nothing else.
I've tested this on multiple machines, networks, browsers, always the same issue.
Here's a TCP Traffic Dump from the client while uploading a file:
.....
TCP 1506 [TCP segment of a reassembled PDU]
>> everything is uploading fine until:
TCP 1506 [TCP Fast Retransmission] [TCP segment of a reassembled PDU]
TCP 66 [TCP Dup ACK 7392#1] 63265 > http [ACK] Seq=4844161 Ack=1 Win=524280 Len=0 TSval=657047088 TSecr=79373730
TCP 1506 [TCP Retransmission] [TCP segment of a reassembled PDU]
>> the last message is repeated about 50 times for >>5-10 secs<< (TX drops to 0 on client, RX drops to 0 on server)
TCP 1506 [TCP segment of a reassembled PDU]
>> upload continues until the next TCP Fast Retransmission and the same thing happens again
The haproxy.conf (haproxy v1.4.18 stable) is the following:
global
log 127.0.0.1 local1 debug
maxconn 4096 # Total Max Connections. This is dependent on ulimit
nbproc 2
defaults
log global
mode http
option httplog
option tcplog
frontend http-in
bind *:80
timeout client 6000
acl is_websocket path_beg /node/
use_backend node_backend if is_websocket
default_backend app_backend
# Rails Server (via nginx+passenger)
backend app_backend
option httpclose
option forwardfor
timeout server 30000
timeout connect 4000
server app1 127.0.0.1:3000
# node.js
backend node_backend
reqrep ^([^\ ]*)\ /node/(.*) \1\ /\2
option httpclose
option forwardfor
timeout queue 5000
timeout server 6000
timeout connect 5000
server node1 127.0.0.1:3200 weight 1 maxconn 4096
Thanks for reading! :)
Simon