Surprising corruption and never-ending fsck after resizing a filesystem.

Posted by Steve Kemp on Server Fault See other posts from Server Fault or by Steve Kemp
Published on 2010-05-13T10:48:25Z Indexed on 2010/05/13 10:55 UTC
Read the original article Hit count: 388

Filed under:
|
|
|
|

System in question has Debian Lenny installed, running a 2.65.27.38 kernel. System has 16Gb memory, and 8x1Tb drives running behind a 3Ware RAID card.

The storage is managed via LVM.

Short version:

  • Running a KVM guest which had 1.7Tb storage allocated to it.
  • The guest was reaching a full-disk.
  • So we decided to resize the disk that it was running upon

We're pretty familiar with LVM, and KVM, so we figured this would be a painless operation:

  • Stop the KVM guest.
  • Extend the size of the LVM partition: "lvextend -L+500Gb ..."
  • Check the filesystem : "e2fsck -f /dev/mapper/..."
  • Resize the filesystem: "resize2fs /dev/mapper/"
  • Start the guest.

The guest booted successfully, and running "df" showed the extra space, however a short time later the system decided to remount the filesystem read-only, without any explicit indication of error.

Being paranoid we shut the guest down and ran the filesystem check again, given the new size of the filesystem we expected this to take a while, however it has now been running for > 24 hours and there is no indication of how long it will take.

Using strace I can see the fsck is "doing stuff", similarly running "vmstat 1" I can see that there are a lot of block input/output operations occurring.

So now my question is threefold:

  • Has anybody come across a similar situation? Generally we've done this kind of resize in the past with zero issues.

  • What is the most likely cause? (3Ware card shows the RAID arrays of the backing stores as being A-OK, the host system hasn't rebooted and nothing in dmesg looks important/unusual)

  • Ignoring brtfs + ext3 (not mature enough to trust) should we make our larger partitions in a different filesystem in the future to avoid either this corruption (whatever the cause) or reduce the fsck time? xfs seems like the obvious candidate?

© Server Fault or respective owner

Related posts about lvm

Related posts about fsck