glusterfs mounts get unmounted when 1 of the 2 bricks goes offline

Posted by Shiquemano on Server Fault See other posts from Server Fault or by Shiquemano
Published on 2013-10-31T21:18:26Z Indexed on 2013/10/31 21:57 UTC
Read the original article Hit count: 232

Filed under:
|
|
|

I have an odd case where 1 of the 2 replicated glusterfs bricks will go offline and take all of the client mounts down with it. As I understand it, this should not be happening. It should fail over to the brick that is still online, but this hasn't been the case. I suspect that this is due to configuration issue.

Here is a description of the system:

  • 2 gluster servers on dedicated hardware (gfs0, gfs1)
  • 8 client servers on vms (client1, client2, client3, ... , client8)

Half of the client servers are mounted with gfs0 as the primary, and the other half are pointed at gfs1. Each of the clients are mounted with the following entry in /etc/fstab:

/etc/glusterfs/datavol.vol /data glusterfs defaults 0 0

Here is the content of /etc/glusterfs/datavol.vol:

volume datavol-client-0
    type protocol/client
    option transport-type tcp
    option remote-subvolume /data/datavol
    option remote-host gfs0
end-volume

volume datavol-client-1
    type protocol/client
    option transport-type tcp
    option remote-subvolume /data/datavol
    option remote-host gfs1
end-volume

volume datavol-replicate-0
    type cluster/replicate
    subvolumes datavol-client-0 datavol-client-1
end-volume

volume datavol-dht
    type cluster/distribute
    subvolumes datavol-replicate-0
end-volume

volume datavol-write-behind
    type performance/write-behind
    subvolumes datavol-dht
end-volume

volume datavol-read-ahead
    type performance/read-ahead
    subvolumes datavol-write-behind
end-volume

volume datavol-io-cache
    type performance/io-cache
    subvolumes datavol-read-ahead
end-volume

volume datavol-quick-read
    type performance/quick-read
    subvolumes datavol-io-cache
end-volume

volume datavol-md-cache
    type performance/md-cache
    subvolumes datavol-quick-read
end-volume

volume datavol
    type debug/io-stats
    option count-fop-hits on
    option latency-measurement on
    subvolumes datavol-md-cache
end-volume

The config above is the latest attempt at making this behave properly. I have also tried the following entry in /etc/fstab:

gfs0:/datavol /data glusterfs defaults,backupvolfile-server=gfs1 0 0

This was the entry for half of the clients, while the other half had:

gfs1:/datavol /data glusterfs defaults,backupvolfile-server=gfs0 0 0

The results were exactly the same as the above configuration. Both configs connect everything just fine, they just don't fail over.

Any help would be appreciated.

© Server Fault or respective owner

Related posts about replication

Related posts about mount

  • 12.10 update breaks NFS mount

    as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
    I've just upgraded to the latest 12.10 beta. Rebooted twice. The problem is with the NFS folders not mounting, here's a verbose log. # mount -v myserver:/nfs_shared/tools /tools/ mount: no type was given - I'll assume nfs because of the colon mount.nfs: timeout set for Mon Oct 1 11:42:28 2012 mount… >>> More

  • Mount SMB / AFP 13.10

    as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
    I cannot seem to get Ubuntu to mount a mac share via SMB or AFP. I've tried the following... AFP: apt-get install afpfs-ng-utils mount_afp afp://user:password@localip/share /mnt/share Error given: "Could not connect, never got a reponse to getstatus, Connection timed out". Which is odd as I can… >>> More

  • Mount Return Code for CIFS mount

    as seen on Server Fault - Search for 'Server Fault'
    When I run the following command (as root or via sudo) from a bash script I get an exit status (or return code in mount man page parlance) of 1: mount -v -t cifs //nasbox/volume /tmpdir/ --verbose -o credentials=/root/cifsid & /tmp/mylog It outputs the following into the myflog file: parsing… >>> More

  • Disable raid member check upon mount to mount damaged nvidia raid1 member

    as seen on Server Fault - Search for 'Server Fault'
    Hi, A friend of mine destroyed his Nvidia RAID1 array somehow and in trying to fix it, he ended up with a non-working array. Because of the RAID metadata, the actual disk data was stored at an offset from the beginning. I was able to identify this offset with dd and a hexeditor and then I used losetup… >>> More

  • Network shares do not mount.

    as seen on Super User - Search for 'Super User'
    My network shares were mounting fine yesterday.. suddenly they are not. They were mounting fine for the last two weeks or however long since I added them. When I run sudo mount -a I get the following error: topsy@monolyth:~$ sudo mount -a mount error(12): Cannot allocate memory Refer to the mount… >>> More