How to handle server failure in an n-tier architecture?
- by andy
Imagine I have an n-tier architecture in an auto-scaled cloud environment with say:
a load balancer in a failover pair
reverse proxy tier
web app tier
db tier
Each tier needs to connect to the instances in the tier below.
What are the standard ways of connecting tiers to make them resilient to failure of nodes in each tier? i.e. how does each tier get the IP addresses of each node in the tier below?
For example if all reverse proxies should route traffic to all web app nodes, how could they be set up so that they don't send traffic to dead web app nodes, and so that when new web app nodes are brought online they can send traffic to it?
I could run an agent that would update all the configs to all the nodes, but it seems inefficient.
I could put an LB pair between each tier, so the tier above only needs to connect to the load balancers, but how do I handle the problem of the LBs dying? This just seems to shunt the problem of tier A needing to know the IPs of all nodes in tier B, to all nodes in tier A needing to know the IPs of all LBs between tiers A and B.
For some applications, they can implement retry logic if they contact a node in the tier below that doesn't respond, but is there any way that some middleware could direct traffic to only live nodes in the following tier?
If I was hosting on AWS I could use an ELB between tiers, but I want to know how I could achieve the same functionality myself.
I've read (briefly) about heartbeat and keepalived - are these relevant here? What are the virtual IPs they talk about and how are they managed? Are there still single points of failure using them?