Guys,
We built around 12 machines a few months ago to run Ubuntu. They each have the following specs:
ASUS Z8NA-D6 motherboard
Dual quad core Intel(R) Xeon(R) CPU E5520 @ 2.27GHz
OCZ Mod Extreme Pro 500W power supply
12 GB Kingston RAM
Nvidia GeForce 9800 GT graphics card
My machine ran well for awhile. However, it started experiencing random lockups. These lockups are not X lockups, they are complete system freezes. The nic stops responding, the magic sysrq buttons won't work. The machine is dead.
I first suspected RAM. Memtest86 didn't find anything, but I replaced the RAM anyway. Still, lockups. So I replaced the graphics card. Still, more lockups. They became more and more frequent and started to happen 2-3 times a day.
So I replaced the motherboard and power supply in one fell swoop. Suddenly, no more lockups! Woohoo!
Except, a week later, in the morning, the machine wouldn't wake up. I reset it, started it up, and the log files showed the last entry at around 11 pm the evening before. This has started occurring with more frequency...now just about every morning I come in, the machine is locked up, and has been since the night before.
Yesterday, in the 3 weeks since I replaced the motherboard and power supply, the machine actually locked up on in in mid-work. This is the first time since replacing the two (MB and PS) that this happened while I was using it. All others occurred while I was away.
I'm at a loss. Nothing is in syslog or message that would indicate a problem around the time of the lockup. Temps are good...I use lmsensors to monitor and have a script that writes the output to file every minute. They never get that high.
The only thing I haven't replaced at this point is the case and the harddrives. I doubt either could be the cause.
What would you do if you were in my shoes? Is there a troubleshooting approach I'm missing?
For the record, all of the other machines, all eleven of them, don't have any problems. They're all running the same version of Ubuntu (Lucid) that I am.
Thanks!