VLAN issues between linux kernels 2.6 / 3.3 in an ESX / Cisco environment
- by David Griffith
I shall attempt to explain an issue I have encountered -
I have a VM running on esx 4.1 with an interface connected to VLAN800 via an access port on a cisco 3750. It runs linux - kernel 2.6.24, and has about 5 to 10 Mbit of chatter on 10.10.0.0/16 and various multicast addresses to look after.
I needed to isolate certain devices from certain other devices on the network, with all of them having to talk to that one VM. No, the address space can't be separated, nor can the networks be easily vlan'd apart. The software on the VM listens to one interface only. Private vlans appear to be the way to go.
So as a test, I built a bridge on the VM that globs together the vlans as needed. All good, everything works as expected. But occasionally (sigh) there's some latency that trips up a couple of profinet devices on the network because, you know, you're not really supposed to trunk real-time protocols around the place willy-nilly.
I shift it to our test/backup server - works nicely, but I don't want it to be running on the test server as we muck around with that a lot.
So I says to myself, "I'll put it on a new VM for testing and tweaking."
I download a small linux distro with kernel 3.3, and install as a new VM with a the vlans as separate interfaces for testing.
I power up the testing VM - ok.
I bring up all the separate interfaces - ok. I can ping the production VM, see all sorts of traffic going past with tshark, etc.
I build a bridge and put the primary vlan on it - the production VM running 2.6 immediately loses its multicast traffic - Unicast is fine. (?)
I shut down the bridge - still no multicast traffic (!?)
I power-cycle the production VM(!?!?) - multicast traffic returns.
I trunk everything into the testing VM and create vlan interfaces under linux instead - same result, as soon as I start the bridge.... no multicast on the production VM.
Ok, so I take a break and leave things alone. I decide to play with a couple of ubiquiti bullet radios - I'm testing various firmware as a side project. I flash a radio with Open-wrt-12.09. I enable a trunk on a port on a cisco on our network so I can muck around with multiple vlans and SSIDs
I power up the radio and connect - ok.
I create a vlan interface from the trunk.... the same vlan as the production VM wayyyyy over there, three cisco routers away. Ok.
I bridge the vlan interface to the wifi interface and immediately get a phone call. The production VM has (suprise!) lost its multicast traffic. Again, nothing comes back until I power-cycle the VM.
What the hell is going on?