Causes of sudden massive filesystem damage? ("root inode is not a directory")
- by poolie
I have a laptop running Maverick (very happily until yesterday), with a Patriot Torx SSD; LUKS encryption of the whole partition; one lvm physical volume on top of that; then home and root in ext4 logical volumes on top of that.
When I tried to boot it yesterday, it complained that it couldn't mount the root filesystem. Running fsck, basically every inode seems to be wrong. Both home and root filesystems show similar problems. Checking a backup superblock doesn't help.
e2fsck 1.41.12 (17-May-2010)
lithe_root was not cleanly unmounted, check forced.
Resize inode not valid. Recreate? no
Pass 1: Checking inodes, blocks, and sizes
Root inode is not a directory. Clear? no
Root inode has dtime set (probably due to old mke2fs). Fix? no
Inode 2 is in use, but has dtime set. Fix? no
Inode 2 has a extra size (4730) which is invalid
Fix? no
Inode 2 has compression flag set on filesystem without compression support. Clear? no
Inode 2 has INDEX_FL flag set but is not a directory.
Clear HTree index? no
HTREE directory inode 2 has an invalid root node.
Clear HTree index? no
Inode 2, i_size is 9581392125871137995, should be 0. Fix? no
Inode 2, i_blocks is 40456527802719, should be 0. Fix? no
Reserved inode 3 (<The ACL index inode>) has invalid mode. Clear? no
Inode 3 has compression flag set on filesystem without compression support. Clear? no
Inode 3 has INDEX_FL flag set but is not a directory.
Clear HTree index? no
....
Running strings across the filesystems, I can see there are what look like filenames and user data there. I do have sufficiently good backups (touch wood) that it's not worth grovelling around to pull back individual files, though I might save an image of the unencrypted disk before I rebuild, just in case.
smartctl doesn't show any errors, neither does the kernel log. Running a write-mode badblocks across the swap lv doesn't find problems either. So the disk may be failing, but not in an obvious way.
At this point I'm basically, as they say, fscked? Back to reinstalling, perhaps running badblocks over the disk, then restoring from backup? There doesn't even seem to be enough data to file a meaningful bug...
I don't recall that this machine crashed last time I used it.
At this point I suspect a bug or memory corruption caused it to write garbage across the disks when it was last running, or some kind of subtle failure mode for the SSD.
What do you think would have caused this? Is there anything else you'd try?