Keepalived takes several minutes to recover in a particular situation
- by NathanE
I've setup Keepalived for a master-slave style virtual IP and it seems to work well.
Both are hosted in almost identical VMs.
If I "pause" the VM that is running the Master. The Slave will take over, as expected, almost instantly.
However if I then "unpause" the VM that runs the Master. The virtual IP will stop responding the pings. And it takes a good 4 or 5 minutes for it to start pinging again.
It seems to be getting desynchronised due to the nature of the way I'm testing it (by pausing/unpausing the VMs).
I admit that pausing and unpausing VMs is a slightly dodgy way to test this. But it has raised a concern for me that there could be other scenarios that cause the same undesirable behaviour.
Is this expected / by design?
Is there anything I can do to the config to improve it?
Thanks.