ext4 filesystem corruption -- maybe hardware error?
- by pts
I'm getting these errors in dmesg after about half an hour after I turn on the computer:
[ 1355.677957] EXT4-fs error (device sda2): htree_dirblock_to_tree: inode #1318420: (comm updatedb.mlocat) bad entry in directory: directory entry across blocks - block=5251700offset=0(0), inode=1802725748, rec_len=179136, name_len=32
[ 1355.677973] Aborting journal on device sda2-8.
[ 1355.678101] EXT4-fs (sda2): Remounting filesystem read-only
[ 1355.690144] EXT4-fs error (device sda2): htree_dirblock_to_tree: inode #1318416: (comm updatedb.mlocat) bad entry in directory: directory entry across blocks - block=5251699offset=0(0), inode=2194783952, rec_len=53280, name_len=152
[ 1356.864720] EXT4-fs error (device sda2): htree_dirblock_to_tree: inode #1312795: (comm updatedb.mlocat) bad entry in directory: directory entry across blocks - block=5251176offset=1460(13748), inode=1432317541, rec_len=208208, name_len=119
/dev/sda is an SSD, and it's using the noop scheduler.
/etc/fstab entry:
UUID=acb4eefa-48ff-4ee1-bb5f-2dccce7d011f / ext4 errors=remount-ro,noatime,discard,user_xattr 0 1
System information:
$ cat /proc/mounts | grep /dev/sd
/dev/sda1 /boot ext2 rw,noatime,errors=continue 0 0
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.04
DISTRIB_CODENAME=lucid
DISTRIB_DESCRIPTION="Ubuntu 10.04.3 LTS"
$ uname -a
Linux leetpad 2.6.35-30-generic-pae #61~lucid1-Ubuntu SMP Thu Oct 13 21:14:29 UTC 2011 i686 GNU/Linux
I've run memtest for 7 hours, it didn't found any memory errors.
Any obvious ideas what can go wrong in this case? The most reasonable thing I can imagine is that the SSD is silently dropping some write requests, which eventually leads to an EXT4 filesystem inconsistency (but no disk I/O errors). How can this happen? Is there a relevant configuration option I should ensure to be set correctly?
What tools should I use to diagnose the hardware failures? Would it be possible to diagnose the SSD failure without overwriting data?