I have a HP Server with SmartArray P400 controller (incl. 256 MB Cache/Battery Backup) with a logicaldrive with replaced failed physicaldrive that does not rebuild.
This is how it looked when I detected the error:
~# /usr/sbin/hpacucli ctrl slot=0 show config
Smart Array P400 in Slot 0 (Embedded) (sn: XXXX)
array A (SATA, Unused Space: 0 MB)
logicaldrive 1 (698.6 GB, RAID 1, OK)
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 750 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 750 GB, OK)
array B (SATA, Unused Space: 0 MB)
logicaldrive 2 (2.7 TB, RAID 5, Failed)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SATA, 750 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SATA, 750 GB, OK)
physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SATA, 750 GB, OK)
physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SATA, 750 GB, Failed)
physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SATA, 750 GB, OK)
unassigned
physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SATA, 750 GB, OK)
~#
I thought that I had drive 2I:1:8 configured as a spare for Array A and Array B, but it seems this was not the case :-(. I noticed the problem due to I/O errors on the host, even if only 1 physicaldrive of the RAID5 is failed.
Does someone know why this could happen? The logicaldrive should go into "Degraded" mode but still be fully accessible from the host os!?
I first tried to add the unassigned drive 2I:1:8 as a spare to logicaldrive 2, but this was not possible:
~# /usr/sbin/hpacucli ctrl slot=0 array B add spares=2I:1:8
Error: This operation is not supported with the current configuration.
Use the "show" command on devices to show additional details
about the configuration.
~#
Interestingly it is possible to add the unassigned drive to the first array without problems. I thought maybe the controller put the array into "failed" state due to the missing spare and protects failed arrays from modification. So I tried was to reenable the logicaldrive (to add the spare afterwards):
~# /usr/sbin/hpacucli ctrl slot=0 ld 2 modify reenable
Warning: Any previously existing data on the logical drive may not
be valid or recoverable. Continue? (y/n) y
Error: This operation is not supported with the current configuration.
Use the "show" command on devices to show additional details
about the configuration.
~#
But as you can see, re-enabling the logicaldrive this was not possible.
Now I replaced the failed drive by hotswapping it with the unassigned drive. The status now looks like this:
~# /usr/sbin/hpacucli ctrl slot=0 show config
Smart Array P400 in Slot 0 (Embedded) (sn: XXXX)
array A (SATA, Unused Space: 0 MB)
logicaldrive 1 (698.6 GB, RAID 1, OK)
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SATA, 750 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SATA, 750 GB, OK)
array B (SATA, Unused Space: 0 MB)
logicaldrive 2 (2.7 TB, RAID 5, Failed)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SATA, 750 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SATA, 750 GB, OK)
physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SATA, 750 GB, OK)
physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SATA, 750 GB, OK)
physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SATA, 750 GB, OK)
~#
The logical drive is still not accessible. Why is it not rebuilding?
What can I do?
FYI, this is the configuration of my controller:
~# /usr/sbin/hpacucli ctrl slot=0 show
Smart Array P400 in Slot 0 (Embedded)
Bus Interface: PCI
Slot: 0
Serial Number: XXXX
Cache Serial Number: XXXX
RAID 6 (ADG) Status: Enabled
Controller Status: OK
Chassis Slot:
Hardware Revision: Rev E
Firmware Version: 5.22
Rebuild Priority: Medium
Expand Priority: Medium
Surface Scan Delay: 15 secs
Surface Analysis Inconsistency Notification: Disabled
Raid1 Write Buffering: Disabled
Post Prompt Timeout: 0 secs
Cache Board Present: True
Cache Status: OK
Accelerator Ratio: 25% Read / 75% Write
Drive Write Cache: Disabled
Total Cache Size: 256 MB
No-Battery Write Cache: Disabled
Cache Backup Power Source: Batteries
Battery/Capacitor Count: 1
Battery/Capacitor Status: OK
SATA NCQ Supported: True
~#
Thanks for you help in advance.