My server freezes within a few hours of logging out. Staying logged in keeps the server running
- by HappyEngineer
I have an Ubuntu Godaddy server I use to host mail and webapps. It started having problems a couple months ago. It would lock up and stop responding to anything. I couldn't ssh into it, so I'd have godaddy power cycle the server.
I have never seen anything that looked suspicious in the var logs (although I'm no expert at reading them). An fsck turned up no problems. Godaddy replaced the ram, but found no hardware problems. I started logging the output from "top" to a log file and found that even that stops running when the server freezes.
Now, here is the crazy part: It got so bad that it would actually go down every few hours, but then it stopped going down. I eventually realized I had left an ssh terminal logged into the machine running top. This seemed unlikely to be a reason, but after the server was up with no problems for a full week (remember, it had been going down after just a few hours), I disconnected from the ssh session. Lo and behold, within a few hours the server froze again!
I had them power cycle again and then left another ssh session open with top. It has been going without problems for 8 days now.
I told others about this and they hardly believe me. I simply can't imagine what is going on. I don't know what else to try other than to just get a new server and reinstall everything.
Does anyone have any ideas about what I can look for to determine what the cause is? Is it possible there's some sort of exploit on the server which only runs if everyone is logged out of the system?
EDIT:
The power management gone haywire sounds plausible, so I've modified the /boot/grub/menu.lst to boot with acpi=off and apm=off. It appears to have prevented kacpid and kacpid_notify from being in the process list, so I assume I did that right. I've disconnected all my sessions from the server. I'll check later tonight to see if it's still up. If it goes down then I'll try the pinging process idea.
EDIT:
It went down again. It lasted about a day. I've had them reboot, so now I'll try running "nohup ping -i 5 google.com &" and then disconnect. If it goes down again I'll come back. Hopefully someone will have some more ideas.