Every couple of days my server suddenly crashes and I must request hardware reset at data center to get it back running.
Today I came back to my shell and saw the server was dead and "top" was running on it, and see below for the "top" right before the crash.
I opened /var/log/messages and scrolled to the reboot time and see nothing, no errors prior to the hard reboot. (I checked in /etc/syslog.conf and I see "*.info;mail.none;authpriv.none;cron.none /var/log/messages" , isn't this good enough to log all problems?)
Usually when I look at the top, the swap is never used up like this! I also don't know why mysqld is at 323% cpu (server only runs drupal and its never slow or overloaded). Solver is my application. I don't know whats that 'sh' doing and 'dovecot' doing.
Its driving me crazy over the last month, please help me solve this mystery and stop my downtimes.
top - 01:10:06 up 6 days, 5 min, 3 users, load average: 34.87, 18.68, 9.03
Tasks: 500 total, 19 running, 481 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 96.6%sy, 0.0%ni, 1.7%id, 1.8%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8165600k total, 8139764k used, 25836k free, 428k buffers
Swap: 2104496k total, 2104496k used, 0k free, 8236k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4421 mysql 15 0 571m 105m 976 S 323.5 1.3 9:08.00 mysqld
564 root 20 -5 0 0 0 R 99.5 0.0 2:49.16 kswapd1
25767 apache 19 0 399m 8060 888 D 79.3 0.1 0:06.64 httpd
25781 apache 19 0 398m 5648 492 R 79.0 0.1 0:08.21 httpd
25961 apache 25 0 398m 5700 560 R 76.7 0.1 0:17.81 httpd
25980 apache 25 0 10816 668 520 R 75.0 0.0 0:46.95 sh
563 root 20 -5 0 0 0 D 71.4 0.0 3:12.37 kswapd0
25766 apache 25 0 399m 7256 756 R 69.7 0.1 0:39.83 httpd
25911 apache 25 0 398m 5612 480 R 58.8 0.1 0:17.63 httpd
25782 apache 25 0 440m 38m 648 R 55.2 0.5 0:18.94 httpd
25966 apache 25 0 398m 5640 556 R 55.2 0.1 0:48.84 httpd
4588 root 25 0 74860 596 476 R 53.9 0.0 0:37.90 crond
25939 apache 25 0 2776 172 84 R 48.9 0.0 0:59.46 solver
4575 root 25 0 397m 6004 1144 R 48.6 0.1 1:00.43 httpd
25962 apache 25 0 398m 5628 492 R 47.9 0.1 0:14.58 httpd
25824 apache 25 0 440m 39m 680 D 47.3 0.5 0:57.85 httpd
25968 apache 25 0 398m 5612 528 R 46.6 0.1 0:42.73 httpd
4477 root 25 0 6084 396 280 R 46.3 0.0 0:59.53 dovecot
25982 root 25 0 397m 5108 240 R 45.9 0.1 0:18.01 httpd
25943 apache 25 0 2916 172 8 R 44.0 0.0 0:53.54 solver
30687 apache 25 0 468m 63m 1124 D 42.3 0.8 0:45.02 httpd
25978 apache 25 0 398m 5688 600 R 23.8 0.1 0:40.99 httpd
25983 root 25 0 397m 5272 384 D 14.9 0.1 0:18.99 httpd
935 root 10 -5 0 0 0 D 14.2 0.0 1:54.60 kjournald
25986 root 25 0 397m 5308 420 D 8.9 0.1 0:04.75 httpd
4011 haldaemo 25 0 31568 1476 716 S 5.6 0.0 0:24.36 hald
25956 apache 23 0 398m 5872 644 S 5.6 0.1 0:13.85 httpd
18336 root 18 0 13004 1332 724 R 0.3 0.0 1:46.66 top
1 root 18 0 10372 212 180 S 0.0 0.0 0:05.99 init
2 root RT -5 0 0 0 S 0.0 0.0 0:00.95 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
5 root RT -5 0 0 0 S 0.0 0.0 0:00.15 migration/1
6 root 34 19 0 0 0 S 0.0 0.0 0:00
.06 ksoftirqd/1
here is a normal top, when server is working fine:
top - 01:50:41 up 21 min, 1 user, load average: 2.98, 2.70, 1.68
Tasks: 271 total, 2 running, 269 sleeping, 0 stopped, 0 zombie
Cpu(s): 15.0%us, 1.1%sy, 0.0%ni, 81.4%id, 2.4%wa, 0.1%hi, 0.0%si, 0.0%st
Mem: 8165600k total, 2035856k used, 6129744k free, 60840k buffers
Swap: 2104496k total, 0k used, 2104496k free, 283744k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2204 apache 17 0 466m 83m 19m S 25.9 1.0 0:22.16 httpd
11347 apache 15 0 466m 83m 19m S 25.9 1.0 0:26.10 httpd
18204 apache 18 0 481m 97m 19m D 25.2 1.2 0:13.99 httpd
4644 apache 18 0 481m 100m 19m D 24.6 1.3 1:17.12 httpd
4727 apache 17 0 481m 99m 19m S 24.3 1.2 1:10.77 httpd
4777 apache 17 0 482m 102m 21m S 23.6 1.3 1:38.27 httpd
8924 apache 15 0 483m 99m 19m S 22.3 1.3 1:13.41 httpd
9390 apache 18 0 483m 99m 19m S 18.9 1.2 1:05.35 httpd
4728 apache 16 0 481m 101m 19m S 14.3 1.3 1:12.50 httpd
4648 apache 15 0 481m 107m 27m S 12.6 1.4 1:18.62 httpd
24955 apache 15 0 467m 82m 19m S 3.3 1.0 0:21.80 httpd
4722 apache 15 0 503m 118m 19m R 1.7 1.5 1:17.79 httpd
4647 apache 15 0 484m 105m 20m S 1.3 1.3 1:40.73 httpd
4643 apache 16 0 481m 100m 20m S 0.7 1.3 1:11.80 httpd
1561 root 15 0 12900 1264 828 R 0.3 0.0 0:00.54 top
4434 mysql 15 0 496m 55m 4812 S 0.3 0.7 0:06.69 mysqld
4646 apache 15 0 481m 100m 19m S 0.3 1.3 1:25.51 httpd
1 root 18 0 10372 692 580 S 0.0 0.0 0:02.09 init
2 root RT -5 0 0 0 S 0.0 0.0 0:00.03 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
5 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/1
6 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1
7 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/1
8 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/2
9 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/2
10 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/2
11 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/3
12 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/3
13 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/3
14 root RT -5 0 0 0 S 0.0 0.0 0:00.03 migration/4
15 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/4
16 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/4
17 root RT -5 0 0 0 S 0.0 0.0 0:00.02 migration/5
18 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/5
19 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/5
20 root RT -5 0 0 0 S 0.0 0.0 0:00.01 migration/6
21 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/6
22 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/6
23 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/7