Why the server is not responding?
- by par
Hello!
Our server occasionally refuses to serve a simple HTML page.
This is happening during a relatively high number of requests.
However, the processor is not heavy loaded and there are a lot of free memory.
The error seems to occure 1 out of 50 requests in average, depending on the server load.
I need to find the source of the problem and take the appropriate actions to eliminate it.
I have a suspicion that the problem source is a huge number of incoming network packets.
There are 5000 packets per second on average.
Traffic - 2 MBits/sec
Can this be the cause of the error?
There is an interesting thing, in case the server fails to respond, the request string is not logged to access.log by Apache.
The error is repeatable from several client computers.
DNS is not involved, since I have accessed the server by the IP.
I have profiled the problem case with tcpdump utility.
These are the good and bad sessions traced by tcpdump.
The request is the same in both experiments.
Good - server returns response. Bad - no response, time-out error.
---- Bad ----
12:23:36.366292 IP 123.45.67.890.61749 > myserver.superbservers.com.www: S 2125316338:2125316338(0) win 8192 <mss 1460,nop,wscale 2,nop,nop,sackOK>
12:23:39.362394 IP 123.45.67.890.61749 > myserver.superbservers.com.www: S 2125316338:2125316338(0) win 8192 <mss 1460,nop,wscale 2,nop,nop,sackOK>
12:23:45.365567 IP 123.45.67.890.61749 > myserver.superbservers.com.www: S 2125316338:2125316338(0) win 8192 <mss 1460,nop,nop,sackOK>
--------
---- Good ----
12:27:07.632229 IP 123.45.67.890.63914 > myserver.superbservers.com.www: S 3581365570:3581365570(0) win 8192 <mss 1460,nop,wscale 2,nop,nop,sackOK>
12:27:10.620946 IP 123.45.67.890.63914 > myserver.superbservers.com.www: S 3581365570:3581365570(0) win 8192 <mss 1460,nop,wscale 2,nop,nop,sackOK>
12:27:10.620969 IP myserver.superbservers.com.www > 123.45.67.890.63914: S 2654770980:2654770980(0) ack 3581365571 win 5840 <mss 1460,nop,nop,sackOK,nop,wscale 6>
12:27:10.838747 IP 123.45.67.890.63914 > myserver.superbservers.com.www: . ack 1 win 4380
12:27:10.957143 IP 123.45.67.890.63914 > myserver.superbservers.com.www: P 1:213(212) ack 1 win 4380
12:27:10.957152 IP myserver.superbservers.com.www > 123.45.67.890.63914: . ack 213 win 108
12:27:10.965543 IP myserver.superbservers.com.www > 123.45.67.890.63914: P 1:630(629) ack 213 win 108
12:27:10.965621 IP myserver.superbservers.com.www > 123.45.67.890.63914: F 630:630(0) ack 213 win 108
12:27:11.183540 IP 123.45.67.890.63914 > myserver.superbservers.com.www: . ack 631 win 4222
12:27:11.185657 IP 123.45.67.890.63914 > myserver.superbservers.com.www: F 213:213(0) ack 631 win 4222
12:27:11.185663 IP myserver.superbservers.com.www > 123.45.67.890.63914: . ack 214 win 108
--------
Hoster: SuperbHosting
OS: Ubuntu
Server parameters: E6300 CONROE 1.86GHZ 2 X 1MB CACHE 1066 1GB DDR2 667MHZ
This is a link to apache configuration file we use http://repkin5.snow.prohosting.com/apache.txt
This is server-status report taken right after time-out error. http://repkin5.snow.prohosting.com/server-status.htm There are only 10 Child Servers running out of 120, so enough space for new requests.
VMSTAT
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 8900 725900 8468 65684 0 0 5 18 11 33 4 3 92 1