ext4 filesystem corruption -- maybe hardware error?

Posted by pts on Server Fault See other posts from Server Fault or by pts
Published on 2011-11-24T00:09:29Z Indexed on 2011/11/24 1:56 UTC
Read the original article Hit count: 489

Filed under:
|
|
|
|

I'm getting these errors in dmesg after about half an hour after I turn on the computer:

 [ 1355.677957] EXT4-fs error (device sda2): htree_dirblock_to_tree: inode #1318420: (comm updatedb.mlocat) bad entry in directory: directory entry across blocks - block=5251700offset=0(0), inode=1802725748, rec_len=179136, name_len=32
 [ 1355.677973] Aborting journal on device sda2-8.
 [ 1355.678101] EXT4-fs (sda2): Remounting filesystem read-only
 [ 1355.690144] EXT4-fs error (device sda2): htree_dirblock_to_tree: inode #1318416: (comm updatedb.mlocat) bad entry in directory: directory entry across blocks - block=5251699offset=0(0), inode=2194783952, rec_len=53280, name_len=152
 [ 1356.864720] EXT4-fs error (device sda2): htree_dirblock_to_tree: inode #1312795: (comm updatedb.mlocat) bad entry in directory: directory entry across blocks - block=5251176offset=1460(13748), inode=1432317541, rec_len=208208, name_len=119

/dev/sda is an SSD, and it's using the noop scheduler.

/etc/fstab entry:

UUID=acb4eefa-48ff-4ee1-bb5f-2dccce7d011f / ext4 errors=remount-ro,noatime,discard,user_xattr 0 1

System information:

$ cat /proc/mounts | grep /dev/sd
/dev/sda1 /boot ext2 rw,noatime,errors=continue 0 0
$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.04
DISTRIB_CODENAME=lucid
DISTRIB_DESCRIPTION="Ubuntu 10.04.3 LTS"
$ uname -a
Linux leetpad 2.6.35-30-generic-pae #61~lucid1-Ubuntu SMP Thu Oct 13 21:14:29 UTC 2011 i686 GNU/Linux

I've run memtest for 7 hours, it didn't found any memory errors.

Any obvious ideas what can go wrong in this case? The most reasonable thing I can imagine is that the SSD is silently dropping some write requests, which eventually leads to an EXT4 filesystem inconsistency (but no disk I/O errors). How can this happen? Is there a relevant configuration option I should ensure to be set correctly?

What tools should I use to diagnose the hardware failures? Would it be possible to diagnose the SSD failure without overwriting data?

© Server Fault or respective owner

Related posts about linux

Related posts about Hardware