my server suddenly crashes every 2 days or so. Programmer has no idea, please help find the cause, here is the top
Posted
by
Alex
on Server Fault
See other posts from Server Fault
or by Alex
Published on 2011-07-03T04:54:38Z
Indexed on
2012/07/01
21:18 UTC
Read the original article
Hit count: 300
centos
|server-crashes
Every couple of days my server suddenly crashes and I must request hardware reset at data center to get it back running.
Today I came back to my shell and saw the server was dead and "top" was running on it, and see below for the "top" right before the crash.
I opened /var/log/messages and scrolled to the reboot time and see nothing, no errors prior to the hard reboot. (I checked in /etc/syslog.conf and I see "*.info;mail.none;authpriv.none;cron.none /var/log/messages" , isn't this good enough to log all problems?)
Usually when I look at the top, the swap is never used up like this! I also don't know why mysqld is at 323% cpu (server only runs drupal and its never slow or overloaded). Solver is my application. I don't know whats that 'sh' doing and 'dovecot' doing.
Its driving me crazy over the last month, please help me solve this mystery and stop my downtimes.
top - 01:10:06 up 6 days, 5 min, 3 users, load average: 34.87, 18.68, 9.03
Tasks: 500 total, 19 running, 481 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 96.6%sy, 0.0%ni, 1.7%id, 1.8%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8165600k total, 8139764k used, 25836k free, 428k buffers
Swap: 2104496k total, 2104496k used, 0k free, 8236k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4421 mysql 15 0 571m 105m 976 S 323.5 1.3 9:08.00 mysqld
564 root 20 -5 0 0 0 R 99.5 0.0 2:49.16 kswapd1
25767 apache 19 0 399m 8060 888 D 79.3 0.1 0:06.64 httpd
25781 apache 19 0 398m 5648 492 R 79.0 0.1 0:08.21 httpd
25961 apache 25 0 398m 5700 560 R 76.7 0.1 0:17.81 httpd
25980 apache 25 0 10816 668 520 R 75.0 0.0 0:46.95 sh
563 root 20 -5 0 0 0 D 71.4 0.0 3:12.37 kswapd0
25766 apache 25 0 399m 7256 756 R 69.7 0.1 0:39.83 httpd
25911 apache 25 0 398m 5612 480 R 58.8 0.1 0:17.63 httpd
25782 apache 25 0 440m 38m 648 R 55.2 0.5 0:18.94 httpd
25966 apache 25 0 398m 5640 556 R 55.2 0.1 0:48.84 httpd
4588 root 25 0 74860 596 476 R 53.9 0.0 0:37.90 crond
25939 apache 25 0 2776 172 84 R 48.9 0.0 0:59.46 solver
4575 root 25 0 397m 6004 1144 R 48.6 0.1 1:00.43 httpd
25962 apache 25 0 398m 5628 492 R 47.9 0.1 0:14.58 httpd
25824 apache 25 0 440m 39m 680 D 47.3 0.5 0:57.85 httpd
25968 apache 25 0 398m 5612 528 R 46.6 0.1 0:42.73 httpd
4477 root 25 0 6084 396 280 R 46.3 0.0 0:59.53 dovecot
25982 root 25 0 397m 5108 240 R 45.9 0.1 0:18.01 httpd
25943 apache 25 0 2916 172 8 R 44.0 0.0 0:53.54 solver
30687 apache 25 0 468m 63m 1124 D 42.3 0.8 0:45.02 httpd
25978 apache 25 0 398m 5688 600 R 23.8 0.1 0:40.99 httpd
25983 root 25 0 397m 5272 384 D 14.9 0.1 0:18.99 httpd
935 root 10 -5 0 0 0 D 14.2 0.0 1:54.60 kjournald
25986 root 25 0 397m 5308 420 D 8.9 0.1 0:04.75 httpd
4011 haldaemo 25 0 31568 1476 716 S 5.6 0.0 0:24.36 hald
25956 apache 23 0 398m 5872 644 S 5.6 0.1 0:13.85 httpd
18336 root 18 0 13004 1332 724 R 0.3 0.0 1:46.66 top
1 root 18 0 10372 212 180 S 0.0 0.0 0:05.99 init
2 root RT -5 0 0 0 S 0.0 0.0 0:00.95 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
5 root RT -5 0 0 0 S 0.0 0.0 0:00.15 migration/1
6 root 34 19 0 0 0 S 0.0 0.0 0:00
.06 ksoftirqd/1
here is a normal top, when server is working fine:
top - 01:50:41 up 21 min, 1 user, load average: 2.98, 2.70, 1.68
Tasks: 271 total, 2 running, 269 sleeping, 0 stopped, 0 zombie
Cpu(s): 15.0%us, 1.1%sy, 0.0%ni, 81.4%id, 2.4%wa, 0.1%hi, 0.0%si, 0.0%st
Mem: 8165600k total, 2035856k used, 6129744k free, 60840k buffers
Swap: 2104496k total, 0k used, 2104496k free, 283744k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2204 apache 17 0 466m 83m 19m S 25.9 1.0 0:22.16 httpd
11347 apache 15 0 466m 83m 19m S 25.9 1.0 0:26.10 httpd
18204 apache 18 0 481m 97m 19m D 25.2 1.2 0:13.99 httpd
4644 apache 18 0 481m 100m 19m D 24.6 1.3 1:17.12 httpd
4727 apache 17 0 481m 99m 19m S 24.3 1.2 1:10.77 httpd
4777 apache 17 0 482m 102m 21m S 23.6 1.3 1:38.27 httpd
8924 apache 15 0 483m 99m 19m S 22.3 1.3 1:13.41 httpd
9390 apache 18 0 483m 99m 19m S 18.9 1.2 1:05.35 httpd
4728 apache 16 0 481m 101m 19m S 14.3 1.3 1:12.50 httpd
4648 apache 15 0 481m 107m 27m S 12.6 1.4 1:18.62 httpd
24955 apache 15 0 467m 82m 19m S 3.3 1.0 0:21.80 httpd
4722 apache 15 0 503m 118m 19m R 1.7 1.5 1:17.79 httpd
4647 apache 15 0 484m 105m 20m S 1.3 1.3 1:40.73 httpd
4643 apache 16 0 481m 100m 20m S 0.7 1.3 1:11.80 httpd
1561 root 15 0 12900 1264 828 R 0.3 0.0 0:00.54 top
4434 mysql 15 0 496m 55m 4812 S 0.3 0.7 0:06.69 mysqld
4646 apache 15 0 481m 100m 19m S 0.3 1.3 1:25.51 httpd
1 root 18 0 10372 692 580 S 0.0 0.0 0:02.09 init
2 root RT -5 0 0 0 S 0.0 0.0 0:00.03 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
5 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/1
6 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1
7 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/1
8 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/2
9 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/2
10 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/2
11 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/3
12 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/3
13 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/3
14 root RT -5 0 0 0 S 0.0 0.0 0:00.03 migration/4
15 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/4
16 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/4
17 root RT -5 0 0 0 S 0.0 0.0 0:00.02 migration/5
18 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/5
19 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/5
20 root RT -5 0 0 0 S 0.0 0.0 0:00.01 migration/6
21 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/6
22 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/6
23 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/7
© Server Fault or respective owner