Is my hard drive about to die?

Posted by Hristo Deshev on Server Fault See other posts from Server Fault or by Hristo Deshev
Published on 2012-11-23T15:13:57Z Indexed on 2012/11/23 17:08 UTC
Read the original article Hit count: 423

Filed under:
|
|
|

I have two hard drives set up as a RAID 1 array on my server (Linux, software RAID using mdadm) and one of them just got me this "present" in syslog:

Nov 23 02:05:29 h2 kernel: [7305215.338153] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Nov 23 02:05:29 h2 kernel: [7305215.338178] ata1.00: irq_stat 0x40000008
Nov 23 02:05:29 h2 kernel: [7305215.338197] ata1.00: failed command: READ FPDMA QUEUED
Nov 23 02:05:29 h2 kernel: [7305215.338220] ata1.00: cmd 60/08:00:d8:df:da/00:00:3a:00:00/40 tag 0 ncq 4096 in
Nov 23 02:05:29 h2 kernel: [7305215.338221]          res 41/40:08:d8:df:da/00:00:3a:00:00/00 Emask 0x409 (media error) <F>
Nov 23 02:05:29 h2 kernel: [7305215.338287] ata1.00: status: { DRDY ERR }
Nov 23 02:05:29 h2 kernel: [7305215.338305] ata1.00: error: { UNC }
Nov 23 02:05:29 h2 kernel: [7305215.358901] ata1.00: configured for UDMA/133
Nov 23 02:05:32 h2 kernel: [7305218.269054] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Nov 23 02:05:32 h2 kernel: [7305218.269081] ata1.00: irq_stat 0x40000008
Nov 23 02:05:32 h2 kernel: [7305218.269101] ata1.00: failed command: READ FPDMA QUEUED
Nov 23 02:05:32 h2 kernel: [7305218.269125] ata1.00: cmd 60/08:00:d8:df:da/00:00:3a:00:00/40 tag 0 ncq 4096 in
Nov 23 02:05:32 h2 kernel: [7305218.269126]          res 41/40:08:d8:df:da/00:00:3a:00:00/00 Emask 0x409 (media error) <F>
Nov 23 02:05:32 h2 kernel: [7305218.269196] ata1.00: status: { DRDY ERR }
Nov 23 02:05:32 h2 kernel: [7305218.269215] ata1.00: error: { UNC }
Nov 23 02:05:32 h2 kernel: [7305218.341565] ata1.00: configured for UDMA/133
Nov 23 02:05:35 h2 kernel: [7305221.193342] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Nov 23 02:05:35 h2 kernel: [7305221.193368] ata1.00: irq_stat 0x40000008
Nov 23 02:05:35 h2 kernel: [7305221.193386] ata1.00: failed command: READ FPDMA QUEUED
Nov 23 02:05:35 h2 kernel: [7305221.193408] ata1.00: cmd 60/08:00:d8:df:da/00:00:3a:00:00/40 tag 0 ncq 4096 in
Nov 23 02:05:35 h2 kernel: [7305221.193409]          res 41/40:08:d8:df:da/00:00:3a:00:00/00 Emask 0x409 (media error) <F>
Nov 23 02:05:35 h2 kernel: [7305221.193474] ata1.00: status: { DRDY ERR }
Nov 23 02:05:35 h2 kernel: [7305221.193491] ata1.00: error: { UNC }
Nov 23 02:05:35 h2 kernel: [7305221.388404] ata1.00: configured for UDMA/133
Nov 23 02:05:38 h2 kernel: [7305224.426316] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Nov 23 02:05:38 h2 kernel: [7305224.426343] ata1.00: irq_stat 0x40000008
Nov 23 02:05:38 h2 kernel: [7305224.426363] ata1.00: failed command: READ FPDMA QUEUED
Nov 23 02:05:38 h2 kernel: [7305224.426387] ata1.00: cmd 60/08:00:d8:df:da/00:00:3a:00:00/40 tag 0 ncq 4096 in
Nov 23 02:05:38 h2 kernel: [7305224.426388]          res 41/40:08:d8:df:da/00:00:3a:00:00/00 Emask 0x409 (media error) <F>
Nov 23 02:05:38 h2 kernel: [7305224.426459] ata1.00: status: { DRDY ERR }
Nov 23 02:05:38 h2 kernel: [7305224.426478] ata1.00: error: { UNC }
Nov 23 02:05:38 h2 kernel: [7305224.498133] ata1.00: configured for UDMA/133
Nov 23 02:05:41 h2 kernel: [7305227.400583] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Nov 23 02:05:41 h2 kernel: [7305227.400608] ata1.00: irq_stat 0x40000008
Nov 23 02:05:41 h2 kernel: [7305227.400627] ata1.00: failed command: READ FPDMA QUEUED
Nov 23 02:05:41 h2 kernel: [7305227.400649] ata1.00: cmd 60/08:00:d8:df:da/00:00:3a:00:00/40 tag 0 ncq 4096 in
Nov 23 02:05:41 h2 kernel: [7305227.400650]          res 41/40:08:d8:df:da/00:00:3a:00:00/00 Emask 0x409 (media error) <F>
Nov 23 02:05:41 h2 kernel: [7305227.400716] ata1.00: status: { DRDY ERR }
Nov 23 02:05:41 h2 kernel: [7305227.400734] ata1.00: error: { UNC }
Nov 23 02:05:41 h2 kernel: [7305227.472432] ata1.00: configured for UDMA/133

From what I read so far, I am not sure if read errors mean that a hard drive is dying on me (no write errors so far). I've had hard drive errors in the past and those always had errors about failing to write to specific sectors in the logs. Not this time.

Should I be replacing the drive? Could something else be causing the problem?

I've scheduled a smartctl -t long test that will finish in a couple of hours. I hope this will give me some more info.

© Server Fault or respective owner

Related posts about raid

Related posts about mdadm