What's the piece of hardware listening on Facebook's or Wikipedia's IP address?
- by Igor Ostrovsky
I am trying to understand how massive sites like Facebook or Wikipedia work, for my intellectual curiosity. I read about various techniques for building scalable sites, but I am still puzzled about one particular detail.
The part that confuses me is that ultimately, the DNS will map the entire domain to a single IP address, or a handful of IP addresses in the case of round-robin DNS.
For example, wikipedia.org has only one type-A DNS record. So, people from all over the world visiting Wikipedia have to send a request to the one IP address specified in DNS.
What is the piece of hardware that listens on the IP address for a massive site, and how can it possibly handle all the load coming from the requests for users all over the world?
Edit 1: Thanks for all the responses! Anycast seems like a feasible answer... Does anyone know of a way to check whether a particular IP address is anycast-routed, so that I could verify that this really is the trick used in practice by large sites?
Edit 2: After more reading on the topic, it appears that anycast is not typically used for dynamic web content. Anycast is usually used for UDP (e.g., DNS lookups), or sometimes for static content.
One interesting thing to note is that Facebook uses profile.ak.fbcdn.net to host static content like style sheets and javascript libraries. Each time I ping this name, I get a response from a different IP address. However, I can't tell whether this is anycast in action, or a completely different technique.
Back to my original question: as far as I can tell, even a large site will have a single expensive piece of load-balancing hardware listening on its handful of public IP addresses.