My problem happened when I attempted to install Windows 7 on it's own SSD. The Linux OS I used which has knowledge of the software RAID system is on a SSD that I disconnected prior to the install. This was so that windows (or I) wouldn't inadvertently mess it up.
However, and in retrospect, foolishly, I left the RAID disks connected, thinking that windows wouldn't be so ridiculous as to mess with a HDD that it sees as just unallocated space.
Boy was I wrong! After copying over the installation files to the SSD (as expected and desired), it also created an ntfs partition on one of the RAID disks. Both unexpected and totally undesired!
.
I changed out the SSDs again, and booted up in linux. mdadm didn't seem to have any problem assembling the array as before, but if I tried to mount the array, I got the error message:
mount: wrong fs type, bad option, bad superblock on /dev/md0,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
dmesg:
EXT4-fs (md0): ext4_check_descriptors: Block bitmap for group 0 not in group (block 1318081259)!
EXT4-fs (md0): group descriptors corrupted!
I then used qparted to delete the newly created ntfs partition on /dev/sdd so that it matched the other three /dev/sd{b,c,e}, and requested a resync of my array with echo repair > /sys/block/md0/md/sync_action
This took around 4 hours, and upon completion, dmesg reports:
md: md0: requested-resync done.
A bit brief after a 4-hour task, though I'm unsure as to where other log files exist (I also seem to have messed up my sendmail configuration). In any case: No change reported according to mdadm, everything checks out.
mdadm -D /dev/md0 still reports:
Version : 1.2
Creation Time : Wed May 23 22:18:45 2012
Raid Level : raid6
Array Size : 3907026848 (3726.03 GiB 4000.80 GB)
Used Dev Size : 1953513424 (1863.02 GiB 2000.40 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Mon May 26 12:41:58 2014
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 4K
Name : okamilinkun:0
UUID : 0c97ebf3:098864d8:126f44e3:e4337102
Events : 423
Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 8 32 1 active sync /dev/sdc
2 8 48 2 active sync /dev/sdd
3 8 64 3 active sync /dev/sde
Trying to mount it still reports:
mount: wrong fs type, bad option, bad superblock on /dev/md0,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
and dmesg:
EXT4-fs (md0): ext4_check_descriptors: Block bitmap for group 0 not in group (block 1318081259)!
EXT4-fs (md0): group descriptors corrupted!
I'm a bit unsure where to proceed from here, and trying stuff "to see if it works" is a bit too risky for me. This is what I suggest I should attempt to do:
Tell mdadm that /dev/sdd (the one that windows wrote into) isn't reliable anymore, pretend it is newly re-introduced to the array, and reconstruct its content based on the other three drives.
I also could be totally wrong in my assumptions, that the creation of the ntfs partition on /dev/sdd and subsequent deletion has changed something that cannot be fixed this way.
My question: Help, what should I do? If I should do what I suggested , how do I do that? From reading documentation, etc, I would think maybe:
mdadm --manage /dev/md0 --set-faulty /dev/sdd
mdadm --manage /dev/md0 --remove /dev/sdd
mdadm --manage /dev/md0 --re-add /dev/sdd
However, the documentation examples suggest /dev/sdd1, which seems strange to me, as there is no partition there as far as linux is concerned, just unallocated space. Maybe these commands won't work without.
Maybe it makes sense to mirror the partition table of one of the other raid devices that weren't touched, before --re-add. Something like:
sfdisk -d /dev/sdb | sfdisk /dev/sdd
Bonus question: Why would the Windows 7 installation do something so st...potentially dangerous?
Update
I went ahead and marked /dev/sdd as faulty, and removed it (not physically) from the array:
# mdadm --manage /dev/md0 --set-faulty /dev/sdd
# mdadm --manage /dev/md0 --remove /dev/sdd
However, attempting to --re-add was disallowed:
# mdadm --manage /dev/md0 --re-add /dev/sdd
mdadm: --re-add for /dev/sdd to /dev/md0 is not possible
--add, was fine.
# mdadm --manage /dev/md0 --add /dev/sdd
mdadm -D /dev/md0 now reports the state as clean, degraded, recovering, and /dev/sdd as spare rebuilding.
/proc/mdstat shows the recovery progress:
md0 : active raid6 sdd[4] sdc[1] sde[3] sdb[0]
3907026848 blocks super 1.2 level 6, 4k chunk, algorithm 2 [4/3] [UU_U]
[>....................] recovery = 2.1% (42887780/1953513424) finish=348.7min speed=91297K/sec
nmon also shows expected output:
¦sdb 0% 87.3 0.0| > |¦
¦sdc 71% 109.1 0.0|RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR > |¦
¦sdd 40% 0.0 87.3|WWWWWWWWWWWWWWWWWWWW > |¦
¦sde 0% 87.3 0.0|> ||
It looks good so far. Crossing my fingers for another five+ hours :)
Update 2
The recovery of /dev/sdd finished, with dmesg output:
[44972.599552] md: md0: recovery done.
[44972.682811] RAID conf printout:
[44972.682815] --- level:6 rd:4 wd:4
[44972.682817] disk 0, o:1, dev:sdb
[44972.682819] disk 1, o:1, dev:sdc
[44972.682820] disk 2, o:1, dev:sdd
[44972.682821] disk 3, o:1, dev:sde
Attempting mount /dev/md0 reports:
mount: wrong fs type, bad option, bad superblock on /dev/md0,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
And on dmesg:
[44984.159908] EXT4-fs (md0): ext4_check_descriptors: Block bitmap for group 0 not in group (block 1318081259)!
[44984.159912] EXT4-fs (md0): group descriptors corrupted!
I'm not sure what do do now. Suggestions?
Output of dumpe2fs /dev/md0:
dumpe2fs 1.42.8 (20-Jun-2013)
Filesystem volume name: Atlas
Last mounted on: /mnt/atlas
Filesystem UUID: e7bfb6a4-c907-4aa0-9b55-9528817bfd70
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 244195328
Block count: 976756712
Reserved block count: 48837835
Free blocks: 92000180
Free inodes: 243414877
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 791
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
RAID stripe width: 2
Flex block group size: 16
Filesystem created: Thu May 24 07:22:41 2012
Last mount time: Sun May 25 23:44:38 2014
Last write time: Sun May 25 23:46:42 2014
Mount count: 341
Maximum mount count: -1
Last checked: Thu May 24 07:22:41 2012
Check interval: 0 (<none>)
Lifetime writes: 4357 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: e177a374-0b90-4eaa-b78f-d734aae13051
Journal backup: inode blocks
dumpe2fs: Corrupt extent header while reading journal super block