iSCSI, failover and XenServer
- by jemmille
I have an iSCSI fail over implementation setup so if one of my storage units fails the other takes over immediately (it also runs the NFS shares). When fail over occurs, volumes are exported, the IP is switched to the other machine and the targets are reconfigured. The fail over of the storage system itself works just fine. I use NexentaStor for my filer.
When I do a test (manual) fail over of my storage the following occurs:
Note: I run the admin VM's on NFS and customer based VM's on iSCSI
All NFS based VM's remain up and working perfectly through the failover and after
All VM 's running on iSCSI eventually report the following:
An error about not being able to write to a particular block
An error about journaling not working
Then the file system goes RO
To get the VM's working again I have to do the following:
Force shutdown of the "broken" VM's.
Detach the iSCSI SR
Re-attach the iSCSI SR
Boot the VM on a different server (5 in my pool) If I don't boot on a different server I get this error "Internal error: Failure("The VDI <uuid> is already attached in RW mode; it can't be attached in RO mode!")" The only way I have found to fix that error is to reboot the entire server it was running on previously which is obviously a huge pain.
Currently multipathing is NOT enabled (but can be and the same thing still occurs). I have edited much of the /etc/iscsid.conf file to work with the timeout settings but to no avail.
In short, my storage fails over properly but XenServer does not keep the connection alive. As a thought, the error that shows up in #4 above might be the ultimate cause and fixing that would fix everything?
Any help would be appreciated more than you know.