How does enterprise failover, such as with google.com, actually work?
- by Alex Regan
We have a few fedora systems that are configured for web, FTP, and email services. We'd like to mirror these services, so that we can provide near 100% reliability for our users. I'm a fairly experienced Linux administrator, but don't have much experience with redundant systems.
What is the best way to do this? How does google and amazon do it? Google.com resolves to multiple IP addresses, but if my local desktop caches one of the IPs that are unreachable, I'm going to get a failed connection message. How do they prevent that from happening?
If one of their servers goes down, how is it automatically redirected to another system, without the end-user ever knowing it?
I understand there are failover devices, but they're only for failing over the system itself, not a complete network.
Let's say we have the worst-case scenario, such as my primary system becomes inaccessible. What are the fundamental components that are used on Linux systems to provide this capability?
I'm looking for concepts, or approaches, not answers like "check out openstack". What are the actual pieces that make up the solution? What has to be done to implement this capability?
Hopefully my question is clear. I'd like to know what the pieces are that make up a failover system and what approach is taken by successful organizations that implement it.
Thanks again,
Alex