Weighted round robins via TTL - possible?

Posted by Joe Hopfgartner on Server Fault See other posts from Server Fault or by Joe Hopfgartner
Published on 2012-02-07T15:11:25Z Indexed on 2012/04/12 17:32 UTC
Read the original article Hit count: 363

I currently use DNS round robin for load balancing, which works great. The records look like this (I have a ttl of 120 seconds)

;; ANSWER SECTION:
orion.2x.to.        116 IN  A   80.237.201.41
orion.2x.to.        116 IN  A   87.230.54.12
orion.2x.to.        116 IN  A   87.230.100.10
orion.2x.to.        116 IN  A   87.230.51.65

I learned that not every ISP / device treats such a response the same way. For example some DNS servers rotate the addresses randomly or always cycle them through. Some just propagate the first entry, others try to determine which is best (regionally near) by looking at the ip address.

However if the userbase is big enough (spreads over multiple ISPs etc) it balances pretty well. The discrepancies from highest to lowest loaded server hardly every exceeds 15%.

However now I have the problem that I am introducing more servers into the systems, that not all have the same capacities.

I currently only have 1gbps servers, but I want to work with 100mbit and also 10gbps servers too.

So what I want is I want to introduce a server with 10 GBps with a weight of 100, a 1 gbps server with a weight of 10 and a 100 mbit server with a weight of 1.

I used to add servers twice to bring more traffic to them (which worked nice. the bandwidth doubled almost.) But adding a 10gbit server 100 times to DNS is a bit rediculous.

So I thought about using the TTL.

If I give server A 240 seconds ttl and server B only 120 seconds (which is about about the minimum to use for round robin, as a lot of dns servers set to 120 if a lower ttl is specified.. so i have heard) I think something like this should occour in an ideal scenario:

first 120 seconds
50% of requests get server A -> keep it for 240 seconds.
50% of requests get server B -> keep it for 120 seconds

second 120 seconds
50% of requests still  have server A cached -> keep it for another 120 seconds.
25% of requests get server A -> keep it for 240 seconds
25% of requests get server B -> keep it for 120 seconds

third 120 seconds
25% will get server A (from the 50% of Server A that now expired) -> cache 240 sec
25% will get server B  (from the 50% of Server A  that now expired) -> cache 120 sec
25% will have server A cached for another 120 seconds
12.5% will get server B (from the 25% of server B that now expired) -> cache 120sec
12.5% will get server A (from the 25% of server B that now expired) -> cache 240 sec

fourth 120 seconds
25% will have server A cached -> cache for another 120 secs
12.5% will get server A (from the 25% of b that now expired) -> cache 240 secs
12.5% will get server B (from the 25% of b that now expired) -> cache 120 secs
12.5% will get server A (from the 25% of a that now expired) -> cache 240 secs
12.5% will get server B (from the 25% of a that now expired) -> cache 120 secs
6.25% will get server A (from the 12.5% of b that now expired) -> cache 240 secs
6.25% will get server B (from the 12.5% of b that now expired) -> cache 120 secs
12.5% will have server A cached -> cache another 120 secs
... i think i lost something at this point but i think you get the idea....

As you can see this gets pretty complicated to predict and it will for sure not work out like this in practice. But it should definitely have an effect on the distribution!

I know that weighted round robin exists and is just controlled by the root server. It just cycles through dns records when responding and returns dns records with a set propability that corresponds to the weighting. My DNS server does not support this, and my requirements are not that precise. If it doesnt weight perfectly its okay, but it should go into the right direction. I think using the TTL field could be a more elegant and easier solution - and it deosnt require a dns server that controls this dynamically, which saves resources - which is in my opinion the whole point of dns load balancing vs hardware load balancers.

My question now is... are there any best prectices / methos / rules of thumb to weight round robin distribution using the TTL attribute of DNS records?

Edit:

The system is a forward proxy server system. The amount of Bandwidth (not requests) exceeds what one single server with ethernet can handle. So I need a balancing solution that distributes the bandwidth to several servers. Are there any alternative methods than using DNS? Of course I can use a load balancer with fibre channel etc, but the costs are rediciulous and it also increases only the width of the bottleneck and does not eliminate it. The only thing i can think of are anycast (is it anycast or multicast?) ip addresses, but I don't have the means to set up such a system.

© Server Fault or respective owner

Related posts about networking

Related posts about dns