MD RAID 1 with external bitmap doesn't fully resync
- by user64744
I have an interesting configuration: dual boot system with a RAID 1 that needs to be visible in both Windows and Linux. The Windows install is Win 7 Enterprise, and the Linux install is Kubuntu 10.04. To get the RAID to work, I set it up using Windows's "Dynamic Disks" RAID 1, and brought it up in Linux using MD with no persistent superblock, and a write-intent bitmap on another partition. (Without this bitmap, MD had no way of knowing that the array was in sync, and would do a complete resync every time the array started.) The array is assembled like so:
mdadm --build /dev/md1 -l 1 -n 2 -b /var/local/md1.bitmap /dev/sdb2 /dev/sdc2
I expected that the first time I ran this command, it would resync the array, write out a bitmap with no dirty chunks, and all would be good. This wasn't the case: after completing the resync, the bitmap was mostly clean, but about 5% dirty blocks remained, as revealed by
mdadm -X /var/local/md1.bitmap
I didn't mount the filesystem on /dev/md1 or touch it in any other way.
I then found that stopping and restarting the array:
mdadm --stop /dev/md1
mdadm --build /dev/md1 -l 1 -n 2 -b /var/local/md1.bitmap /dev/sdb2 /dev/sdc2
did indeed read in the bitmap, with an ensuing resync that went quickly because most of the blocks were marked clean. The confusing part is that this resync further reduced the number of dirty blocks, but still did not remove all of them. By repeatedly stopping and restarting I could slowly bring the dirty block count down to around 0.6%, where it seemed to level out.
Any ideas what could be causing this? It smells to me of a race condition somewhere that leads to blocks either being skipped over during synchronization or not properly cleared from the bitmap, but I really have no evidence to prove this. It doesn't look like hardware issues since both drives are new and have zero read errors and reallocated sectors reported by smartctl -a.