Diagnosing high CPU waiting
- by Will
I have a monitoring server that is running icinga/collectd/graphite with about 50 hosts. I have noticed high load/slugging performance on the box. If you take a look at top, you'll see:
Cpu(s): 0.6%us, 0.2%sy, 0.0%ni, 7.6%id, 23.4%wa, 0.0%hi, 0.2%si, 0.0%st
Notice the HUGE %wa value, which as far as I know means a network or disk bottleneck. ifconfig shows no dropping packets and there's not a ton of bandwidth going on, so that leaves disk issues, right? There's not a lot of disk writing going on either...iotop is reporting we're only writing a little over 1 MB per second and the RAID tool reports everything is A-OK and write caching is enabled.
How do I go about trying to figure out how to fix this?
UPDATE:
iostat -x output is:
avg-cpu: %user %nice %system %iowait %steal %idle
0.62 0.10 0.31 9.65 0.00 89.31
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.21 33.34 83.55 16.54 1599.94 399.07 19.97 43.21 416.98 3.71 37.13