For about 5 days now, and seemingly out of the blue, my linux server has started locking up from time to time.
The pattern is always the same as far as I can tell from top and iotop commands around the time it starts happening:
One or more httpd processes (usually one) hang and start using up 100% of CPU power, the %wa goes close to 100% and in the iotop I see several httpd processes with 99.99% in the IO column.
I'm also running an SVN server on this machine through apache and the one way that I've been consistently able to reproduce this is to do an SVN commit of new files or an SVN update from the repository on this server (I am the only one using this SVN repository). This will always reproduce this scenario successfully, but until very recently I had no problems at all checking in/out of SVN.
But sometimes it just happens for no detectable reason at all it seems.
So it seems like there is some issue with my Apache that leads it to have processes use up a lot of read/write upon certain triggers.
I was wondering if anyone could help me uncover that issue.
EDIT: OK now it's happening again:
This is top:
[root@server ~]# top
top - 10:56:54 up 2:59, 5 users, load average: 171.46, 70.35, 27.01
Tasks: 328 total, 2 running, 326 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.9%us, 2.0%sy, 0.0%ni, 0.0%id, 96.1%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2021144k total, 1968192k used, 52952k free, 2500k buffers
Swap: 4194288k total, 2938584k used, 1255704k free, 39008k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10390 apache 20 0 2774m 936m 6200 D 2.0 47.4 1:52.27 httpd
2149 root 20 0 927m 13m 1040 S 0.7 0.7 1:50.46 namecoind
11 root 20 0 0 0 0 R 0.3 0.0 0:30.10 events/0
23 root 20 0 0 0 0 S 0.3 0.0 0:17.88 kblockd/1
2049 root 20 0 382m 4932 2880 D 0.3 0.2 0:03.67 httpd
2144 root 20 0 1702m 69m 1164 S 0.3 3.5 5:19.68 bitcoind
6325 root 20 0 15164 1100 656 R 0.3 0.1 0:11.09 top
10311 apache 20 0 387m 9496 7320 D 0.3 0.5 0:01.89 httpd
10313 apache 20 0 391m 10m 7364 D 0.3 0.5 0:02.40 httpd
10466 apache 20 0 399m 12m 7392 D 0.3 0.7 0:02.41 httpd
10599 apache 20 0 391m 9324 7340 D 0.3 0.5 0:00.15 httpd
10628 apache 20 0 384m 7620 4052 D 0.3 0.4 0:00.01 httpd
10633 apache 20 0 384m 7048 3504 D 0.3 0.3 0:00.01 httpd
10634 apache 20 0 384m 8012 4048 D 0.3 0.4 0:00.02 httpd
10638 apache 20 0 400m 22m 9.8m D 0.3 1.1 0:01.93 httpd
10640 apache 20 0 385m 8288 4028 D 0.3 0.4 0:00.03 httpd
10641 apache 20 0 401m 21m 6376 D 0.3 1.1 0:01.45 httpd
10759 apache 20 0 385m 8816 3480 D 0.3 0.4 0:01.45 httpd
10773 apache 20 0 384m 8044 3464 D 0.3 0.4 0:00.02 httpd
This is an iotop snapshot:
Total DISK READ: 5.93 M/s | Total DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
10732 be/4 apache 3.76 K/s 0.00 B/s 0.00 % 58.48 % httpd
876 be/3 root 0.00 B/s 52.68 K/s 0.00 % 52.98 % [jbd2/dm-1-8]
10906 be/4 root 124.17 K/s 0.00 B/s 0.00 % 23.03 % sh -c [ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1
2156 be/4 root 206.94 K/s 0.00 B/s 0.00 % 21.15 % bitcoind
10904 be/4 mysql 0.00 B/s 0.00 B/s 0.00 % 18.94 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
10773 be/4 apache 7.53 K/s 0.00 B/s 0.00 % 14.77 % httpd
10641 be/4 apache 15.05 K/s 0.00 B/s 0.00 % 11.57 % httpd
10399 be/4 apache 1057.29 K/s 0.00 B/s 43.16 % 10.56 % httpd
10682 be/4 sw-cp-se 158.03 K/s 0.00 B/s 0.00 % 7.45 % sw-engine-cgi -c /usr/local/psa/admin/conf/php.ini -d auto_prepend_file=auth.php3 -u psaadm
10774 be/4 apache 3.76 K/s 0.00 B/s 0.00 % 6.53 % httpd
10624 be/4 apache 0.00 B/s 0.00 B/s 0.00 % 5.53 % httpd
10356 be/4 apache 899.26 K/s 0.00 B/s 35.52 % 4.01 % httpd
10795 be/4 apache 0.00 B/s 0.00 B/s 0.00 % 3.93 % httpd
10804 be/4 apache 7.53 K/s 0.00 B/s 0.00 % 3.08 % httpd
4379 be/4 root 2.89 M/s 0.00 B/s 99.99 % 0.00 % namecoind
10619 be/4 apache 462.80 K/s 0.00 B/s 7.80 % 0.00 % httpd
10636 be/4 apache 3.76 K/s 0.00 B/s 0.00 % 0.00 % httpd
10716 be/4 mysql 105.35 K/s 0.00 B/s 5.92 % 0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
1988 be/4 root 18.81 K/s 0.00 B/s 0.00 % 0.00 % spamd_full.sock
I also ran lsof -p for pid 10390 which was way up top under the top command and this is the bottom line where I can sort of see what request this was and it says CLOSE_WAIT:
httpd 10390 apache 34u IPv6 315879 0t0 TCP default-domain.com:https->crawl-66-249-65-91.googlebot.com:42907 (CLOSE_WAIT)
I'm still not sure what exactly is causing this all to happen though?
I killed that service but %wa and load average remain high, I also stopped mysqld and other services. It really only goes down once I stop httpd altogether, and even then I can't start it without finding remaining hanging httpd processes via "netstat -tulpn", killing those or doing "killall -9 httpd" and after waiting a while for it to cycle through all those then doing /etc/init.d/httpd start