Server load spikes several times a day, load average for the past month is 5 times the load average all year
Posted
by
AMF
on Server Fault
See other posts from Server Fault
or by AMF
Published on 2011-06-21T21:38:04Z
Indexed on
2011/06/22
0:24 UTC
Read the original article
Hit count: 505
My Munin notifications set up for our (Debian) LAMP cluster have been notifying me continuously that our load on our production machine has been at dangerous levels. While the average load all year typically runs between 2 and 8, the load in the past month and only the past month -- has been skyrocketing to 10, 18, and occasionally even 50-60. The spikes last only 5-10 minutes at a time and occur about every 2-3 hours. The spikes do not effect performance only because I have a script that sends traffic off our server to a mirror CDN when the load goes above 10. I've looked for cron jobs that correlate with this timeframe but there is nothing I can see that would cause this. Site traffic is also normal (we receive about 200K visits per day). I'm also trying to think of anything I've changed around the time this problem began, and I really cannot think of anything.
This is probably not much to go on. Maybe there is a clue in the top print-out (below) that I'm not seeing.
How do I proceed to find the cause?
-- Typical top when the load is NOT spiking:
top - 11:13:09 up 472 days, 25 min, 1 user, load average: 6.08, 4.29, 3.80
Tasks: 105 total, 1 running, 104 sleeping, 0 stopped, 0 zombie
Cpu(s): 41.2%us, 5.8%sy, 0.0%ni, 49.5%id, 2.7%wa, 0.1%hi, 0.7%si, 0.0%st
Mem: 3369592k total, 2166980k used, 1202612k free, 559504k buffers
Swap: 2650684k total, 1892k used, 2648792k free, 1129116k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32046 apache 15 0 36300 12m 9828 S 20 0.4 0:01.97 apache2
32679 apache 15 0 36568 13m 10m S 19 0.4 0:01.69 apache2
31441 apache 15 0 36616 13m 10m S 19 0.4 0:04.13 apache2
31477 apache 15 0 36596 13m 9.8m S 15 0.4 0:01.99 apache2
31993 apache 15 0 36876 16m 12m S 12 0.5 0:02.01 apache2
31782 apache 15 0 36836 14m 10m S 8 0.4 0:02.17 apache2
32198 apache 15 0 36536 13m 10m S 7 0.4 0:01.59 apache2
880 apache 15 0 36508 9708 6236 S 7 0.3 0:00.42 apache2
31945 apache 17 0 36876 16m 13m S 5 0.5 0:03.17 apache2
32197 apache 16 0 36636 10m 7504 S 5 0.3 0:02.70 apache2
32326 apache 15 0 37024 11m 7632 S 5 0.3 0:02.15 apache2
32565 apache 15 0 37280 13m 9.8m S 5 0.4 0:03.75 apache2
32676 apache 15 0 36896 16m 12m S 4 0.5 0:00.95 apache2
32678 apache 15 0 36536 12m 9692 S 4 0.4 0:02.27 apache2
974 apache 16 0 37064 9888 6016 D 4 0.3 0:00.13 apache2
32150 apache 16 0 36832 13m 10m S 3 0.4 0:01.74 apache2
31780 apache 16 0 36848 11m 7660 S 3 0.3 0:02.87 apache2
© Server Fault or respective owner