I have 5 servers, all with similar hardware (i7, four 2tb 7200rpm drives, two 4tb 5400rpm drives, 430 watt power supply), and lately the machines have been freezing up. This has gotten worse in the last day or so, and I can't pinpoint any explanation. One recent change was adding the two 4tb hard drives. The crashes happen most often while running a large Hadoop job, so I was originally thinking the load was causing some issues, but last night one server just froze without any heavy load on the box (or so I think), other than HDFS (Hadoop's distributed file system) was probably rebalancing itself since two of the five nodes were offline.
If I plugin a monitor and keyboard to one of these frozen machines, I can't get any response or feedback on the screen.
Any ideas on possible points of failure and/or different logs I can look at to investigate? Thanks
Edit: The systems are running Ubuntu 10.04
Edit 2: More on hardware:
intel core i7-930 bloomfield 2.8ghz processor (quad core)
12gb (6 x 2gb) kingston ddr3 1333 ram
antec earthwatts green 430 power supply
msi x58m lga 1366 motherboard