How to detect hard disk failure?
- by Devator
So, one of my servers has a hard disk failure. It's running software RAID, the system locked up and according to /proc/mdstat (and /var/log/messages), it's really down:
Personalities : [raid1]
md2 : active raid1 sdb2[1]
104320 blocks [2/1] [_U]
md5 : active raid1 sdb5[1]
2104448 blocks [2/1] [_U]
md6 : active raid1 sdb6[1]
830134656 blocks [2/1] [_U]
md1 : active raid1 sdb1[1]
143363968 blocks [2/1] [_U]
and
Nov 5 22:04:37 m38501 smartd[4467]: Device: /dev/sda, not capable of SMART self-check
However
when I do smartctl -H /dev/sda, it passes the test. It also passes the test with smartctl --test=short /dev/sda.
So, is smartctl a broken testing tool, or am I doing something completely off?