Switch flooding when bonding interfaces in Linux
- by John Philips
+--------+
| Host A |
+----+---+
| eth0 (AA:AA:AA:AA:AA:AA)
|
|
+----+-----+
| Switch 1 | (layer2/3)
+----+-----+
|
+----+-----+
| Switch 2 |
+----+-----+
|
+----------+----------+
+-------------------------+ Switch 3 +-------------------------+
| +----+-----------+----+ |
| | | |
| | | |
| eth0 (B0:B0:B0:B0:B0:B0) | | eth4 (B4:B4:B4:B4:B4:B4) |
| +----+-----------+----+ |
| | Host B | |
| +----+-----------+----+ |
| eth1 (B1:B1:B1:B1:B1:B1) | | eth5 (B5:B5:B5:B5:B5:B5) |
| | | |
| | | |
+------------------------------+ +------------------------------+
Topology overview
Host A has a single NIC.
Host B has four NICs which are bonded using the balance-alb mode.
Both hosts run RHEL 6.0, and both are on the same IPv4 subnet.
Traffic analysis
Host A is sending data to Host B using some SQL database application.
Traffic from Host A to Host B: The source int/MAC is eth0/AA:AA:AA:AA:AA:AA, the destination int/MAC is eth5/B5:B5:B5:B5:B5:B5.
Traffic from Host B to Host A: The source int/MAC is eth0/B0:B0:B0:B0:B0:B0, the destination int/MAC is eth0/AA:AA:AA:AA:AA:AA.
Once the TCP connection has been established, Host B sends no further frames out eth5.
The MAC address of eth5 expires from the bridge tables of both Switch 1 & Switch 2.
Switch 1 continues to receive frames from Host A which are destined for B5:B5:B5:B5:B5:B5.
Because Switch 1 and Switch 2 no longer have bridge table entries for B5:B5:B5:B5:B5:B5, they flood the frames out all ports on the same VLAN (except for the one it came in on, of course).
Reproduce
If you ping Host B from a workstation which is connected to either Switch 1 or 2, B5:B5:B5:B5:B5:B5 re-enters the bridge tables and the flooding stops.
After five minutes (the default bridge table timeout), flooding resumes.
Question
It is clear that on Host B, frames arrive on eth5 and exit out eth0. This seems ok as that's what the Linux bonding algorithm is designed to do - balance incoming and outgoing traffic. But since the switch stops receiving frames with the source MAC of eth5, it gets timed out of the bridge table, resulting in flooding.
Is this normal? Why aren't any more frames originating from eth5? Is it because there is simply no other traffic going on (the only connection is a single large data transfer from Host A)?
I've researched this for a long time and haven't found an answer. Documentation states that no switch changes are necessary when using mode 6 of the Linux interface bonding (balance-alb). Is this behavior occurring because Host B doesn't send any further packets out of eth5, whereas in normal circumstances it's expected that it would? One solution is to setup a cron job which pings Host B to keep the bridge table entries from timing out, but that seems like a dirty hack.