Linux RAID: Replacing Failed Drive...permanantly
- by user137519
Okay, odd question here. I have a server with RAID 5. A drive failed, in a really physically in a really odd way. On the machine it boots and is seen by the BIOS but...no partition can be seen on the drive consistantly (in and out). 2 out of 3 drives working...I made new spare disk and added it, RAID 5 rebuilt clean. All appears well but...when I reboot it keeps trying to use the 2nd drive which doesn't give any partition data, so of course the RAID 5 gets 2 out of 3...again. The status of my drive is as follows: /dev/sda2:Good /dev/sdb2 (drive has physical problem so no partition data) bad, /dev/sdc2:good /dev/sdd2:good. Every time I reboot the mdadm system seems to keep trying to use /dev/sdb which has physical failure (although spins and is detected). /dev/sdd is the new drive I created. I added /dev/sdd to the raid and it rebuilds the raid but this action isn't memorized upon reboot so it keeps listing /dev/sda and /dev/sdc but doesn't use the perfectly good /dev/sdd until I re-add manually. I've tried removing the dead drive with the mdadm tool, but as it cannot see /dev/sdb paritions it will not fail or remove it (says partition doesn't exist). the /etc/mdadm.conf was automatically made on the original OS install which only lists:
DEVICE partitions
MAILADDR root
ARRAY /dev/md2 super-minor=2
ARRAY /dev/md0 super-minor=0
ARRAY /dev/md1 super-minor=1
Basically just the raids to use on boot.
I need to remove this semi-dead drive (/dev/sdb) but I'd prefer to know why this is happening before I do. any ideas or suggestions. I supposed I could attempt to clone/replace /dev/sdb (the partitions on drive show up, then disappear shortly after) but given the partition "chester cat" behaviour this seems risky to me and as I have a working "spare" it seems unnecessary. Thanks in advance for your insight.