Keepalived takes several minutes to recover in a particular situation

Posted by NathanE on Server Fault See other posts from Server Fault or by NathanE
Published on 2011-02-28T15:10:46Z Indexed on 2011/02/28 15:26 UTC
Read the original article Hit count: 313

Filed under:

I've setup Keepalived for a master-slave style virtual IP and it seems to work well.

Both are hosted in almost identical VMs.

If I "pause" the VM that is running the Master. The Slave will take over, as expected, almost instantly.

However if I then "unpause" the VM that runs the Master. The virtual IP will stop responding the pings. And it takes a good 4 or 5 minutes for it to start pinging again.

It seems to be getting desynchronised due to the nature of the way I'm testing it (by pausing/unpausing the VMs).

I admit that pausing and unpausing VMs is a slightly dodgy way to test this. But it has raised a concern for me that there could be other scenarios that cause the same undesirable behaviour.

  • Is this expected / by design?
  • Is there anything I can do to the config to improve it?

Thanks.

© Server Fault or respective owner

Related posts about keepalived