Let me acknowledge first off that I have made mistakes, and that I have a backup for most but not all of the data on this RAID. I still have hope of recovering the rest of the data. I don't have the kind of money to take the drives to a recovery expert company.
Mistake #0, not having a 100% backup. I know.
I have a mdadm RAID5 system of 4x3TB. Drives /dev/sd[b-e], all with one partition /dev/sd[b-e]1. I'm aware that RAID5 on very large drives is risky, yet I did it anyway.
Recent events
The RAID become degraded after a two drive failure. One drive [/dev/sdc] is really gone, the other [/dev/sde] came back up after a power cycle, but was not automatically re-added to the RAID. So I was left with a 4 device RAID with only 2 active drives [/dev/sdb and /dev/sdd].
Mistake #1, not using dd copies of the drives for restoring the RAID. I did not have the drives or the time.
Mistake #2, not making a backup of the superblock and mdadm -E of the remaining drives.
Recovery attempt
I reassembled the RAID in degraded mode with
mdadm --assemble --force /dev/md0, using /dev/sd[bde]1.
I could then access my data. I replaced /dev/sdc with a spare; empty; identical drive.
I removed the old /dev/sdc1 from the RAID
mdadm --fail /dev/md0 /dev/sdc1
Mistake #3, not doing this before replacing the drive
I then partitioned the new /dev/sdc and added it to the RAID.
mdadm --add /dev/md0 /dev/sdc1
It then began to restore the RAID. ETA 300 mins. I followed the process via /proc/mdstat to 2% and then went to do other stuff.
Checking the result
Several hours (but less then 300 mins) later, I checked the process. It had stopped due to a read error on /dev/sde1.
Here is where the trouble really starts
I then removed /dev/sde1 from the RAID and re-added it. I can't remember why I did this; it was late.
mdadm --manage /dev/md0 --remove /dev/sde1
mdadm --manage /dev/md0 --add /dev/sde1
However, /dev/sde1 was now marked as spare. So I decided to recreate the whole array using --assume-clean using what I thought was the right order, and with /dev/sdc1 missing.
mdadm --create /dev/md0 --assume-clean -l5 -n4 /dev/sdb1 missing /dev/sdd1 /dev/sde1
That worked, but the filesystem was not recognized while trying to mount. (It should have been EXT4).
Device order
I then checked a recent backup I had of /proc/mdstat, and I found the drive order.
md0 : active raid5 sdb1[0] sde1[4] sdd1[2] sdc1[1]
8790402048 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
I then remembered this RAID had suffered a drive loss about a year ago, and recovered from it by replacing the faulty drive with a spare one. That may have scrambled the device order a bit...so there was no drive [3] but only [0],[1],[2], and [4].
I tried to find the drive order with the Permute_array script: https://raid.wiki.kernel.org/index.php/Permute_array.pl but that did not find the right order.
Questions
I now have two main questions:
I screwed up all the superblocks on the drives, but only gave:
mdadm --create --assume-clean
commands (so I should not have overwritten the data itself on /dev/sd[bde]1. Am I right that in theory the RAID can be restored [assuming for a moment that /dev/sde1 is ok] if I just find the right device order?
Is it important that /dev/sde1 be given the device number [4] in the RAID? When I create it with
mdadm --create /dev/md0 --assume-clean -l5 -n4 \
/dev/sdb1 missing /dev/sdd1 /dev/sde1
it is assigned the number [3]. I wonder if that is relevant to the calculation of the parity blocks. If it turns out to be important, how can I recreate the array with /dev/sdb1[0] missing[1] /dev/sdd1[2] /dev/sde1[4]? If I could get that to work I could start it in degraded mode and add the new drive /dev/sdc1 and let it resync again.
It's OK if you would like to point out to me that this may not have been the best course of action, but you'll find that I realized this. It would be great if anyone has any suggestions.