How to make a Linux software RAID1 detect disc corruption?
- by Paul
This is one of the nightmare days: A virtualized server running on a Linux SW-RAID1 runs a VM that exhibits random segfaults in seemingly random codechunks.
While debugging I find that a file gives different md5sums on each and every run. Digging deeper I find this: The raw disc partitions that make up the RAID1 mirror contain 2 bit-differences and ca. 9 sectors are completely empty on one disc and filled with data on the other disc.
Obviously Linux gives back a sector from a undeterministically chosen disc of the mirror set. So sometimes the same sector is returned OK, sometimes the corrupted is given back.
The docs say:
RAID cannot and is not supposed to guard against data corruption on the media. Therefore, it doesn't make any sense either, to purposely corrupt data (using dd for example) on a disk to see how the RAID system will handle that. It is most likely (unless you corrupt the RAID superblock) that the RAID layer will never find out about the corruption, but your filesystem on the RAID device will be corrupted.
Thanks. That will help me sleep. :-/
Is there a way to have Linux at least detect this corruption by using sector checksumming or something like that? Would this be detected in a RAID5 setup? Is this the moment I wish I used ZFS or btrfs (once it becomes usable without uber-admin capabilities)?