Linux software RAID6: rebuild slow
- by Ole Tange
I am trying to find the bottleneck in the rebuilding of a software raid6.
## Pause rebuilding when measuring raw I/O performance
# echo 1 > /proc/sys/dev/raid/speed_limit_min
# echo 1 > /proc/sys/dev/raid/speed_limit_max
## Drop caches so that does not interfere with measuring
# sync ; echo 3 | tee /proc/sys/vm/drop_caches >/dev/null
# time parallel -j0 "dd if=/dev/{} bs=256k count=4000 | cat >/dev/null" ::: sdbd sdbc sdbf sdbm sdbl sdbk sdbe sdbj sdbh sdbg
4000+0 records in
4000+0 records out
1048576000 bytes (1.0 GB) copied, 7.30336 s, 144 MB/s
[... similar for each disk ...]
# time parallel -j0 "dd if=/dev/{} skip=15000000 bs=256k count=4000 | cat >/dev/null" ::: sdbd sdbc sdbf sdbm sdbl sdbk sdbe sdbj sdbh sdbg
4000+0 records in
4000+0 records out
1048576000 bytes (1.0 GB) copied, 12.7991 s, 81.9 MB/s
[... similar for each disk ...]
So we can read sequentially at 140 MB/s in the outer tracks and 82 MB/s in the inner tracks on all the drives simultaneously. Sequential write performance is similar.
This would lead me to expect a rebuild speed of 82 MB/s or more.
# echo 800000 > /proc/sys/dev/raid/speed_limit_min
# echo 800000 > /proc/sys/dev/raid/speed_limit_max
# cat /proc/mdstat
md2 : active raid6 sdbd[10](S) sdbc[9] sdbf[0] sdbm[8] sdbl[7] sdbk[6] sdbe[11] sdbj[4] sdbi[3](F) sdbh[2] sdbg[1]
27349121408 blocks super 1.2 level 6, 128k chunk, algorithm 2 [9/8] [UUU_UUUUU]
[=========>...........] recovery = 47.3% (1849905884/3907017344) finish=855.9min speed=40054K/sec
But we only get 40 MB/s. And often this drops to 30 MB/s.
# iostat -dkx 1
sdbc 0.00 8023.00 0.00 329.00 0.00 33408.00 203.09 0.70 2.12 1.06 34.80
sdbd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdbe 13.00 0.00 8334.00 0.00 33388.00 0.00 8.01 0.65 0.08 0.06 47.20
sdbf 0.00 0.00 8348.00 0.00 33388.00 0.00 8.00 0.58 0.07 0.06 48.00
sdbg 16.00 0.00 8331.00 0.00 33388.00 0.00 8.02 0.71 0.09 0.06 48.80
sdbh 961.00 0.00 8314.00 0.00 37100.00 0.00 8.92 0.93 0.11 0.07 54.80
sdbj 70.00 0.00 8276.00 0.00 33384.00 0.00 8.07 0.78 0.10 0.06 48.40
sdbk 124.00 0.00 8221.00 0.00 33380.00 0.00 8.12 0.88 0.11 0.06 47.20
sdbl 83.00 0.00 8262.00 0.00 33380.00 0.00 8.08 0.96 0.12 0.06 47.60
sdbm 0.00 0.00 8344.00 0.00 33376.00 0.00 8.00 0.56 0.07 0.06 47.60
iostat says the disks are not 100% busy (but only 40-50%). This fits with the hypothesis that the max is around 80 MB/s.
Since this is software raid the limiting factor could be CPU. top says:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
38520 root 20 0 0 0 0 R 64 0.0 2947:50 md2_raid6
6117 root 20 0 0 0 0 D 53 0.0 473:25.96 md2_resync
So md2_raid6 and md2_resync are clearly busy taking up 64% and 53% of a CPU respectively, but not near 100%.
The chunk size (128k) of the RAID was chosen after measuring which chunksize gave the least CPU penalty.
If this speed is normal: What is the limiting factor? Can I measure that?
If this speed is not normal: How can I find the limiting factor? Can I change that?