Formula to calculate probability of unrecoverable read error during RAID rebuild
Posted
by
OlafM
on Super User
See other posts from Super User
or by OlafM
Published on 2012-12-09T11:34:44Z
Indexed on
2012/12/13
17:06 UTC
Read the original article
Hit count: 369
I need to compare the reliability of different RAID systems with either consumer or enterprise drives. The formula to have the probability of success of a rebuild, ignoring mechanical problems, is simple:
error_probability = 1 - (1-per_bit_error_rate)^bit_read
and with 3 TB drives I get
38% probability to experience an URE (unrecoverable read error) for a 2+1 disks RAID5 (4.7% for enterprise drives)
21% for a RAID1 (2.4% for enterprise drives)
51% probability of error during recovery for the 3+1 RAID5 often used by users of SOHO products like Synologys. Most people don't know about this.
Calculating the error for single disk tolerance is easy, my question concerns systems tolerant to multiple disks failures (RAID6/Z2, RAIDZ3 and RAID1 with multiple disks).
If only the first disk is used for rebuild and the second one is read again from the beginning in case or an URE, then the error probability is the one calculated above squared (14.5% for consumer RAID5 2+1, 4.5% for consumer RAID1 1+2). However, I suppose (at least in ZFS that has full checksums!) that the second parity/available disk is read only where needed, meaning that only few sectors are needed: how many UREs can possibly happen in the first disk? not many, otherwise the error probability for single-disk tolerance systems would skyrocket even more than I calculated.
If I'm correct, a second parity disk would practically lower the risk to extremely low values.
Am I correct?
© Super User or respective owner