Complete machine freezes...at a loss
Posted
by user28818
on Super User
See other posts from Super User
or by user28818
Published on 2010-06-02T16:42:32Z
Indexed on
2010/06/02
16:44 UTC
Read the original article
Hit count: 247
Guys,
We built around 12 machines a few months ago to run Ubuntu. They each have the following specs:
ASUS Z8NA-D6 motherboard Dual quad core Intel(R) Xeon(R) CPU E5520 @ 2.27GHz OCZ Mod Extreme Pro 500W power supply 12 GB Kingston RAM Nvidia GeForce 9800 GT graphics card
My machine ran well for awhile. However, it started experiencing random lockups. These lockups are not X lockups, they are complete system freezes. The nic stops responding, the magic sysrq buttons won't work. The machine is dead.
I first suspected RAM. Memtest86 didn't find anything, but I replaced the RAM anyway. Still, lockups. So I replaced the graphics card. Still, more lockups. They became more and more frequent and started to happen 2-3 times a day.
So I replaced the motherboard and power supply in one fell swoop. Suddenly, no more lockups! Woohoo!
Except, a week later, in the morning, the machine wouldn't wake up. I reset it, started it up, and the log files showed the last entry at around 11 pm the evening before. This has started occurring with more frequency...now just about every morning I come in, the machine is locked up, and has been since the night before.
Yesterday, in the 3 weeks since I replaced the motherboard and power supply, the machine actually locked up on in in mid-work. This is the first time since replacing the two (MB and PS) that this happened while I was using it. All others occurred while I was away.
I'm at a loss. Nothing is in syslog or message that would indicate a problem around the time of the lockup. Temps are good...I use lmsensors to monitor and have a script that writes the output to file every minute. They never get that high.
The only thing I haven't replaced at this point is the case and the harddrives. I doubt either could be the cause.
What would you do if you were in my shoes? Is there a troubleshooting approach I'm missing?
For the record, all of the other machines, all eleven of them, don't have any problems. They're all running the same version of Ubuntu (Lucid) that I am.
Thanks!
© Super User or respective owner