apache webserver unresponsible with server-status showing all child processes waiting for connection

Posted by Jeff on Server Fault See other posts from Server Fault or by Jeff
Published on 2012-01-31T12:24:56Z Indexed on 2014/06/09 9:28 UTC
Read the original article Hit count: 342

Filed under:
|
|
|

My setup: i have 3 nearly identical webserver machines serving the same high loaded dynamic website with simple load balancing over dns. The service has been working for over two ears with the same apache config. apache2, php5, ubuntu 8.04 linux 2.6.24-29-server

My problem: since about two weeks i'm experiencing problems with this config. Nearly every day i have one small moment about 5 minutes, in which the website is unreachable. I'm still able to login to the servers over ssh. If i run htop, i see the machine simply doing nothing. i have about 1000 apache processes running, but no cpu activity.

i've used the apache mod_status to debug this situation. the process scoreboard looks like this:

_C.___K_______________________R._______.__K_K____K___C_______.__
_______C__________.___________________________________.________C
_.____K__________K___K_WK_____._K_____________________________._
W______K__________K________.____________________._______C_______
_C_.__K__K____.._.._____________________________________C_______
_R___________K___.______C________.C_________.______._____C______
____________KKC____K_____K__WC_________________C_____.__.____.__
_____________________C_________K______.____C______._____________
_.___C____.___.___________________________.K______.____K________
W__.___________________C.__.____K________K_______R_._.__._______
__C__C_.__________C__C_______._____W______________C_.___C_______
____.______C_____________C________.____C____________.________._K
__.__________.K_____________K_________._____C____.K__________KW_
__K.W________R_________._______.___W___________.____.__K_____W__
W___.___..________W____K

Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process

So the most of the processes are just waiting for connection. after about 5 minutes the situation will return to normal: i have lot least processes on every machine, the most workers have the "."-status (meaing they are open to process a request) and of course the website is reachable!

so i'm trying to find something in the logs, but there is simply nothing... the apache access log is silent for about 4 minutes, the same is for the error log. i also can not figure out anything wrong in other system logs.

the situation is the same on all 3 webservers (all of them have this load peak and unresposibility at the same time), so i do not thing this is hardware related. but i think, this might be related to some network (tcp) issue.

any ideas?

EDIT: some more information, that i have just discovered:

it has just happened again. and i was able to verify that i'm also not able to connect locally when this problem occurs. i have made some connection statistics with the following command after it happend netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c

  • 109 CLOSE_WAIT
  • 2652 ESTABLISHED
  • 2 FIN_WAIT1
  • 11 LAST_ACK
  • 12 LISTEN
  • 91 SYN_RECV
  • 1 SYN_SENT
  • 16 TIME_WAIT

If i execute the same command some time later, i have something like this:

  • 4 CLOSING
  • 108 ESTABLISHED
  • 18 FIN_WAIT1
  • 182 FIN_WAIT2
  • 37 LAST_ACK
  • 12 LISTEN
  • 50 SYN_RECV
  • 11276 TIME_WAIT

So in the normal situation i have only 100-200 open connections by clients beeing handled by apache in this moment. when i have this "crash", i have a lot more connections. what is the best way to analyse this?

EDIT2: the important lines in apache2.conf are:

KeepAlive On
MaxKeepAliveRequests 20
KeepAliveTimeout 1
<IfModule mpm_prefork_module>
ServerLimit           920
StartServers          30
MinSpareServers       80
MaxSpareServers      120
MaxClients          920
MaxRequestsPerChild   700
</IfModule>

it is an apache2 prefork with php_mod.

the server has 8GB ram and a 4gb swap partition.

© Server Fault or respective owner

Related posts about apache-2.2

Related posts about webserver