Problem with RAID5 (mdadm) - disk detached
Posted
by
poscaman
on Server Fault
See other posts from Server Fault
or by poscaman
Published on 2011-04-18T14:17:37Z
Indexed on
2012/06/28
15:18 UTC
Read the original article
Hit count: 832
Having these lines in /var/log/syslog
Apr 18 16:53:05 Server kernel: [4487878.816036] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
Apr 18 16:53:05 Server kernel: [4487878.816058] ata4: SWNCQ:qc_active 0x1 defer_bits 0x0 last_issue_tag 0x0
Apr 18 16:53:05 Server kernel: [4487878.816059] dhfis 0x1 dmafis 0x1 sdbfis 0x0
Apr 18 16:53:05 Server kernel: [4487878.816093] ata4: ATA_REG 0x40 ERR_REG 0x0
Apr 18 16:53:05 Server kernel: [4487878.816108] ata4: tag : dhfis dmafis sdbfis sacitve
Apr 18 16:53:05 Server kernel: [4487878.816125] ata4: tag 0x0: 1 1 0 1
Apr 18 16:53:05 Server kernel: [4487878.816150] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
Apr 18 16:53:05 Server kernel: [4487878.816178] ata4.00: failed command: WRITE FPDMA QUEUED
Apr 18 16:53:05 Server kernel: [4487878.816199] ata4.00: cmd 61/08:00:00:88:e0/00:00:e8:00:00/40 tag 0 ncq 4096 out
Apr 18 16:53:05 Server kernel: [4487878.816200] res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 18 16:53:05 Server kernel: [4487878.816253] ata4.00: status: { DRDY }
Apr 18 16:53:05 Server kernel: [4487878.816272] ata4: hard resetting link
Apr 18 16:53:05 Server kernel: [4487878.816274] ata4: nv: skipping hardreset on occupied port
Apr 18 16:53:06 Server kernel: [4487879.676029] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 18 16:53:07 Server kernel: [4487880.416749] ata4.00: n_sectors mismatch 3907029168 != 268435455
Apr 18 16:53:07 Server kernel: [4487880.416752] ata4.00: revalidation failed (errno=-19)
Apr 18 16:53:07 Server kernel: [4487880.416773] ata4.00: limiting speed to UDMA/133:PIO2
Apr 18 16:53:11 Server kernel: [4487884.676024] ata4: hard resetting link
Apr 18 16:53:11 Server kernel: [4487884.676027] ata4: nv: skipping hardreset on occupied port
Apr 18 16:53:12 Server kernel: [4487885.144032] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 18 16:53:12 Server kernel: [4487885.240185] ata4.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80)
Apr 18 16:53:12 Server kernel: [4487885.240190] ata4.00: revalidation failed (errno=-5)
Apr 18 16:53:12 Server kernel: [4487885.240210] ata4.00: disabled
Apr 18 16:53:17 Server kernel: [4487890.144023] ata4: hard resetting link
Apr 18 16:53:17 Server kernel: [4487891.024033] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 18 16:53:17 Server kernel: [4487891.033357] ata4.00: ATA-8: WDC WD20EARS-00S8B1, 80.00A80, max UDMA/133
Apr 18 16:53:17 Server kernel: [4487891.033360] ata4.00: 3907029168 sectors, multi 1: LBA48 NCQ (depth 31/32)
Apr 18 16:53:17 Server kernel: [4487891.048347] ata4.00: configured for UDMA/133
Apr 18 16:53:17 Server kernel: [4487891.048361] sd 3:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Apr 18 16:53:17 Server kernel: [4487891.048365] sd 3:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor]
Apr 18 16:53:17 Server kernel: [4487891.048369] Descriptor sense data with sense descriptors (in hex):
Apr 18 16:53:17 Server kernel: [4487891.048371] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
Apr 18 16:53:17 Server kernel: [4487891.048378] 00 00 00 00
Apr 18 16:53:17 Server kernel: [4487891.048382] sd 3:0:0:0: [sdc] Add. Sense: No additional sense information
Apr 18 16:53:17 Server kernel: [4487891.048385] sd 3:0:0:0: [sdc] CDB: Write(10): 2a 00 e8 e0 88 00 00 00 08 00
Apr 18 16:53:17 Server kernel: [4487891.048393] end_request: I/O error, dev sdc, sector 3907028992
Apr 18 16:53:17 Server kernel: [4487891.048420] sd 3:0:0:0: rejecting I/O to offline device
Apr 18 16:53:17 Server kernel: [4487891.048440] sd 3:0:0:0: rejecting I/O to offline device
Apr 18 16:53:17 Server kernel: [4487891.048458] end_request: I/O error, dev sdc, sector 3907028992
Apr 18 16:53:17 Server kernel: [4487891.048477] md: super_written gets error=-5, uptodate=0
Apr 18 16:53:17 Server kernel: [4487891.048482] raid5: Disk failure on sdc, disabling device.
Apr 18 16:53:17 Server kernel: [4487891.048483] raid5: Operation continuing on 3 devices.
Apr 18 16:53:17 Server kernel: [4487891.048525] ata4: EH complete
Apr 18 16:53:17 Server kernel: [4487891.048554] sd 3:0:0:0: rejecting I/O to offline device
Apr 18 16:53:17 Server kernel: [4487891.048576] sd 3:0:0:0: rejecting I/O to offline device
Apr 18 16:53:17 Server kernel: [4487891.048596] sd 3:0:0:0: rejecting I/O to offline device
Apr 18 16:53:17 Server kernel: [4487891.048615] sd 3:0:0:0: [sdc] READ CAPACITY(16) failed
Apr 18 16:53:17 Server kernel: [4487891.048617] sd 3:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Apr 18 16:53:17 Server kernel: [4487891.048620] sd 3:0:0:0: [sdc] Sense not available.
Apr 18 16:53:17 Server kernel: [4487891.048624] sd 3:0:0:0: rejecting I/O to offline device
Apr 18 16:53:17 Server kernel: [4487891.048643] sd 3:0:0:0: rejecting I/O to offline device
Apr 18 16:53:17 Server kernel: [4487891.048663] sd 3:0:0:0: rejecting I/O to offline device
Apr 18 16:53:17 Server kernel: [4487891.048681] sd 3:0:0:0: [sdc] READ CAPACITY failed
Apr 18 16:53:17 Server kernel: [4487891.048683] sd 3:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Apr 18 16:53:17 Server kernel: [4487891.048685] sd 3:0:0:0: [sdc] Sense not available.
Apr 18 16:53:17 Server kernel: [4487891.048689] sd 3:0:0:0: rejecting I/O to offline device
Apr 18 16:53:17 Server kernel: [4487891.048709] sd 3:0:0:0: rejecting I/O to offline device
Apr 18 16:53:17 Server kernel: [4487891.048800] sd 3:0:0:0: rejecting I/O to offline device
Apr 18 16:53:17 Server kernel: [4487891.048860] sd 3:0:0:0: rejecting I/O to offline device
Apr 18 16:53:17 Server kernel: [4487891.049028] sd 3:0:0:0: [sdc] Asking for cache data failed
Apr 18 16:53:17 Server kernel: [4487891.049048] sd 3:0:0:0: [sdc] Assuming drive cache: write through
Apr 18 16:53:17 Server kernel: [4487891.049071] sdc: detected capacity change from 2000398934016 to 0
Apr 18 16:53:17 Server kernel: [4487891.049080] ata4.00: detaching (SCSI 3:0:0:0)
Apr 18 16:53:18 Server kernel: [4487891.061149] sd 3:0:0:0: [sdc] Stopping disk
Apr 18 16:53:18 Server kernel: [4487891.485492] RAID5 conf printout:
Apr 18 16:53:18 Server kernel: [4487891.485496] --- rd:4 wd:3
Apr 18 16:53:18 Server kernel: [4487891.485500] disk 0, o:1, dev:sdb
Apr 18 16:53:18 Server kernel: [4487891.485502] disk 1, o:0, dev:sdc
Apr 18 16:53:18 Server kernel: [4487891.485504] disk 2, o:1, dev:sdd
Apr 18 16:53:18 Server kernel: [4487891.485506] disk 3, o:1, dev:sde
Apr 18 16:53:18 Server kernel: [4487891.497014] RAID5 conf printout:
Apr 18 16:53:18 Server kernel: [4487891.497016] --- rd:4 wd:3
Apr 18 16:53:18 Server kernel: [4487891.497018] disk 0, o:1, dev:sdb
Apr 18 16:53:18 Server kernel: [4487891.497019] disk 2, o:1, dev:sdd
Apr 18 16:53:18 Server kernel: [4487891.497021] disk 3, o:1, dev:sde
Apr 18 16:53:18 Server kernel: [4487891.838719] scsi 3:0:0:0: Direct-Access ATA WDC WD20EARS-00S 80.0 PQ: 0 ANSI: 5
Apr 18 16:53:18 Server kernel: [4487891.838886] sd 3:0:0:0: Attached scsi generic sg3 type 0
Apr 18 16:53:18 Server kernel: [4487891.838911] sd 3:0:0:0: [sdf] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Apr 18 16:53:18 Server kernel: [4487891.838964] sd 3:0:0:0: [sdf] Write Protect is off
Apr 18 16:53:18 Server kernel: [4487891.838967] sd 3:0:0:0: [sdf] Mode Sense: 00 3a 00 00
Apr 18 16:53:18 Server kernel: [4487891.838988] sd 3:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 18 16:53:20 Server kernel: [4487891.839147] sdf: unknown partition table
Apr 18 16:53:20 Server kernel: [4487893.130026] sd 3:0:0:0: [sdf] Attached SCSI disk
Right now, i'm unable to do anything on /dev/sdc. Is there any way to try to re-attach it? I don't want to power-down the server unless absolutely necessary
System:
- Debian Stable 2.6.32-5-amd64
- mdadm version 3.1.4-1+8efb9d1
cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb[0] sdc[4](F) sde[3] sdd[2]
5860543488 blocks level 5, 64k chunk, algorithm 2 [4/3] [U_UU]
unused devices: <none>
mdadm --examine --scan
ARRAY /dev/md0 UUID=1a7744b5:912ec7af:f82a9565:e3b453b4
© Server Fault or respective owner