glusterfs mounts get unmounted when 1 of the 2 bricks goes offline
- by Shiquemano
I have an odd case where 1 of the 2 replicated glusterfs bricks will go offline and take all of the client mounts down with it. As I understand it, this should not be happening. It should fail over to the brick that is still online, but this hasn't been the case. I suspect that this is due to configuration issue.
Here is a description of the system:
2 gluster servers on dedicated hardware (gfs0, gfs1)
8 client servers on vms (client1, client2, client3, ... , client8)
Half of the client servers are mounted with gfs0 as the primary, and the other half are pointed at gfs1. Each of the clients are mounted with the following entry in /etc/fstab:
/etc/glusterfs/datavol.vol /data glusterfs defaults 0 0
Here is the content of /etc/glusterfs/datavol.vol:
volume datavol-client-0
type protocol/client
option transport-type tcp
option remote-subvolume /data/datavol
option remote-host gfs0
end-volume
volume datavol-client-1
type protocol/client
option transport-type tcp
option remote-subvolume /data/datavol
option remote-host gfs1
end-volume
volume datavol-replicate-0
type cluster/replicate
subvolumes datavol-client-0 datavol-client-1
end-volume
volume datavol-dht
type cluster/distribute
subvolumes datavol-replicate-0
end-volume
volume datavol-write-behind
type performance/write-behind
subvolumes datavol-dht
end-volume
volume datavol-read-ahead
type performance/read-ahead
subvolumes datavol-write-behind
end-volume
volume datavol-io-cache
type performance/io-cache
subvolumes datavol-read-ahead
end-volume
volume datavol-quick-read
type performance/quick-read
subvolumes datavol-io-cache
end-volume
volume datavol-md-cache
type performance/md-cache
subvolumes datavol-quick-read
end-volume
volume datavol
type debug/io-stats
option count-fop-hits on
option latency-measurement on
subvolumes datavol-md-cache
end-volume
The config above is the latest attempt at making this behave properly. I have also tried the following entry in /etc/fstab:
gfs0:/datavol /data glusterfs defaults,backupvolfile-server=gfs1 0 0
This was the entry for half of the clients, while the other half had:
gfs1:/datavol /data glusterfs defaults,backupvolfile-server=gfs0 0 0
The results were exactly the same as the above configuration. Both configs connect everything just fine, they just don't fail over.
Any help would be appreciated.