DRBD stacked resources: recovering from failure
- by Marcus Downing
We're running a stacked four-node DRBD setup like this:
A --> B
| |
v v
C D
This means three DRBD resources running across these four servers. Servers A and B are Xen hosts running VMs, while servers C and D are for backups. A is in the same datacentre as C.
From server A to server C, in the first datacentre, using protocol B
From server B to server D, in the second datacentre, using protocol B
From server A to server B, different datacentres, stacked resource using protocol A
First question: booting a stacked resource
We haven't got any vital data running on this setup yet - we're still making sure it works first. This means simulating power cuts, network outages etc and seeing what steps we need to recover.
When we pull the power out of server A, both resources go down; it attempts to bring them back up at next boot. However, it only succeeds at bringing up the lower-level resource, A-C. The stacked resource A-B doesn't even try to connect, presumably because it can't find the device until it's a connected primary on the lower level.
So if anything goes wrong we need to manually log in and bring that resource up, then start the virtual machine on top of it.
Second question: setting the primary of a stacked resource
Our lower-level resources are configured so that the right one is considered primary:
resource test-AC {
on A { ... }
on C { ... }
startup {
become-primary-on A;
}
}
But I don't see any way to do the same with a stacked resource, as the following isn't a valid config:
resource test-AB {
stacked-on-top-of test-AC { ... }
stacked-on-top-of test-BD { ... }
startup {
become-primary-on test-AC;
}
}
This too means that recovering from a failure requires manual intervention. Is there no way to set the automatic primary for a stacked resource?