I got a problem that I cannot solve: Our fileserver runs XUbuntu and 3 RAID1s. One has a problem since monday: it consists of sdb and sdc. sdb was marked as faulty by mdadm for unknown reasons. I used --remove to remove it from the RAID and then to add it by --add. All was fine, re-syncing started but never got above 0% and after a few seconds, sdb was again marked as 'faulty spare' (and therefore the RAID degraded, but clean).
So I saved the first 512 byte of the old sdb to a file, bought a new HDD of same size (4TB), shut down the computer and replaced sdb physically, switched the computer back on and wrote the 512 byte back to the new drive to have the same partition info as the old drive (both are the same type, from same company). But the new drive shows the same behaviour as the old: I can add, re-syncing starts and after a few seconds its marked as 'faulty spare'.
Here exactly what i did:
mdadm --remove /dev/md/1 /dev/sdb
maadm --detail /dev/md/1 gives me:
/dev/md/1:
Version : 1.2
Creation Time : Sat Jun 8 22:32:05 2013
Raid Level : raid1
Array Size : 3906887360 (3725.90 GiB 4000.65 GB)
Used Dev Size : 3906887360 (3725.90 GiB 4000.65 GB)
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is persistent
Update Time : Thu Nov 7 06:56:13 2013
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Name : File-Server:1 (local to host File-Server)
UUID : 44ed561f:b733e946:e69820f4:aba9b223
Events : 2424
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 32 1 active sync /dev/sdc
mdadm --add /dev/md/1 /dev/sdb
mdadm --detail /dev/md/1 gives me:
Version : 1.2
Creation Time : Sat Jun 8 22:32:05 2013
Raid Level : raid1
Array Size : 3906887360 (3725.90 GiB 4000.65 GB)
Used Dev Size : 3906887360 (3725.90 GiB 4000.65 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Thu Nov 7 06:57:49 2013
State : clean, degraded, recovering
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
Rebuild Status : 0% complete
Name : File-Server:1 (local to host File-Server)
UUID : 44ed561f:b733e946:e69820f4:aba9b223
Events : 2431
Number Major Minor RaidDevice State
2 8 16 0 faulty spare rebuilding /dev/sdb
1 8 32 1 active sync /dev/sdc
and after a few seconds:
/dev/md/1:
Version : 1.2
Creation Time : Sat Jun 8 22:32:05 2013
Raid Level : raid1
Array Size : 3906887360 (3725.90 GiB 4000.65 GB)
Used Dev Size : 3906887360 (3725.90 GiB 4000.65 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Thu Nov 7 06:57:50 2013
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
Name : File-Server:1 (local to host File-Server)
UUID : 44ed561f:b733e946:e69820f4:aba9b223
Events : 2436
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 32 1 active sync /dev/sdc
2 8 16 - faulty spare /dev/sdb
same behaviour if I zero the superblock (mdadm --zero-superblock /dev/sdb) before adding sdb.
I do all commands as root and the system holds 3 more 4TB drives, ie the mainboard can handle them. The old harddrive was checked for errors using badblocks, but all is fine.
Does anybody have any idea, what the problem is?