I have a server running Xen 4.1 with Oneiric in the dom0 and each of the 4 domUs. The system disks of the domUs are LVM2 volumes built on top of an mdadm RAID1.
All the domU system disks are EXT4 and are created using snapshots of the same original template. 3 of them run perfectly, but one (called s-ub-02) keeps on being remounted read-only. A subsequent e2fsck results in a single "invalid extent" diagnosis:
e2fsck 1.41.14 (22-Dec-2010)
/dev/domu/s-ub-02-root contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inode 525418 has an invalid extent
(logical block 8959, invalid physical block 0, len 0)
Clear<y>? yes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/domu/s-ub-02-root: 77757/655360 files (0.3% non-contiguous), 360592/2621440 blocks
The console shows typically the following errors for the system disk (xvda2):
[101980.903416] EXT4-fs error (device xvda2): ext4_ext_find_extent:732: inode #525418: comm apt-get: bad header/extent: invalid extent entries - magic f30a, entries 12, max 340(340), depth 0(0)
[101980.903473] EXT4-fs (xvda2): Remounting filesystem read-only
I have created new versions of the system disk. The same thing always happens. This, and the fact that the disk is ultimately on a RAID1, leads me to preclude a hardware disk error.
The only obvious distinguishing feature of this domU is the presence of nfs-kernel-server, so I suspect that. Its exports file looks like this:
/exports/users 192.168.0.0/255.255.248.0(rw,sync,no_subtree_check)
/exports/media/music 192.168.0.0/255.255.248.0(rw,sync,no_subtree_check)
/exports/media/pictures 192.168.0.0/255.255.248.0(rw,sync,no_subtree_check)
/exports/opt 192.168.0.0/255.255.248.0(rw,sync,no_subtree_check)
/exports/users and /exports/opt are LVM2 volumes from the same volume group as the system disk. /exports/media is an EXT2 volume. (There is an issue where clients see /exports/media/pictures as being a read-only volume, which I mention for completeness.)
With the exception of the read-only problem, the NFS server appears to work correctly under light load for several hours before the "invalid extent" problem occurs.
There are no helpful entries in /var/log. All of a sudden, no more files are written, so you can see when the disk was remounted read-only, but there is no indication of what the cause might be.
Can anyone help me with this problem?
Steve