We had a little failover problem with one of our HAProxy VMs today. When we dug into it, we found this:
Jan 26 07:41:45 haproxy2 kernel: [226818.070059] __ratelimit: 10 callbacks suppressed
Jan 26 07:41:45 haproxy2 kernel: [226818.070064] Out of socket memory
Jan 26 07:41:47 haproxy2 kernel: [226819.560048] Out of socket memory
Jan 26 07:41:49 haproxy2 kernel: [226822.030044] Out of socket memory
Which, per this link, apparently has to do with low default settings for net.ipv4.tcp_mem. So we increased them by 4x from their defaults (this is Ubuntu Server, not sure if the Linux flavor matters):
current values are: 45984 61312 91968
new values are: 183936 245248 367872
After that, we started seeing a bizarre error message:
Jan 26 08:18:49 haproxy1 kernel: [ 2291.579726] Route hash chain too long!
Jan 26 08:18:49 haproxy1 kernel: [ 2291.579732] Adjust your secret_interval!
Shh.. it's a secret!!
This apparently has to do with /proc/sys/net/ipv4/route/secret_interval which defaults to 600 and controls periodic flushing of the route cache
The secret_interval instructs the kernel how often to blow away ALL route
hash entries regardless of how new/old they are. In our environment this is
generally bad. The CPU will be busy rebuilding thousands of entries per
second every time the cache is cleared. However we set this to run once a
day to keep memory leaks at bay (though we've never had one).
While we are happy to reduce this, it seems odd to recommend dropping the entire route cache at regular intervals, rather than simply pushing old values out of the route cache faster.
After some investigation, we found /proc/sys/net/ipv4/route/gc_elasticity which seems to be a better option for keeping the route table size in check:
gc_elasticity can best be described as the average bucket depth the kernel
will accept before it starts expiring route hash entries. This will help
maintain the upper limit of active routes.
We adjusted elasticity from 8 to 4, in the hopes of the route cache pruning itself more aggressively. The secret_interval does not feel correct to us. But there are a bunch of settings and it's unclear which are really the right way to go here.
/proc/sys/net/ipv4/route/gc_elasticity (8)
/proc/sys/net/ipv4/route/gc_interval (60)
/proc/sys/net/ipv4/route/gc_min_interval (0)
/proc/sys/net/ipv4/route/gc_timeout (300)
/proc/sys/net/ipv4/route/secret_interval (600)
/proc/sys/net/ipv4/route/gc_thresh (?)
rhash_entries (kernel parameter, default unknown?)
We don't want to make the Linux routing worse, so we're kind of afraid to mess with some of these settings.
Can anyone advise which routing parameters are best to tune, for a high traffic HAProxy instance?