I'm trying to use a multi-VM vagrant environment as a testbed for deploying OpenStack, and I've run into a networking problem with trying to communicate from one VM, to a VM-inside-of-a-VM.
I have two Vagrant nodes, a cloud controller node and a compute node. I'm using host-only networking. My Vagrantfile looks like this:
Vagrant::Config.run do |config|
config.vm.box = "precise64"
config.vm.define :controller do |controller_config|
controller_config.vm.network :hostonly, "192.168.206.130" # eth1
controller_config.vm.network :hostonly, "192.168.100.130" # eth2
controller_config.vm.host_name = "controller"
end
config.vm.define :compute1 do |compute1_config|
compute1_config.vm.network :hostonly, "192.168.206.131" # eth1
compute1_config.vm.network :hostonly, "192.168.100.131" # eth2
compute1_config.vm.host_name = "compute1"
compute1_config.vm.customize ["modifyvm", :id, "--memory", 1024]
end
end
When I try to start up a (QEMU-based) VM, it boots successfully on compute1, and its virtual nic (vnet0) is connected via a bridge, br100:
root@compute1:~# brctl show 100
bridge name bridge id STP enabled interfaces
br100 8000.08002798c6ef no eth2
vnet0
When the QEMU VM makes a request to the DHCP server (dnsmasq) running on controller, I can see the request reaches the controller because of the output on the syslog on the controller:
Aug 6 02:34:56 precise64 dnsmasq-dhcp[12042]: DHCPDISCOVER(br100) fa:16:3e:07:98:11
Aug 6 02:34:56 precise64 dnsmasq-dhcp[12042]: DHCPOFFER(br100) 192.168.100.2 fa:16:3e:07:98:11
However, the DHCPOFFER never makes it back to the VM running on compute1. If I watch the requests using tcpdump on the vboxnet3 interface on my host machine that runs Vagrant (Mac OS X), I can see both the requests and the replies
$ sudo tcpdump -i vboxnet3 -n port 67 or port 68
tcpdump: WARNING: vboxnet3: That device doesn't support promiscuous mode
(BIOCPROMISC: Operation not supported on socket)
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vboxnet3, link-type EN10MB (Ethernet), capture size 65535 bytes
22:51:20.694040 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:07:98:11, length 280
22:51:20.694057 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:07:98:11, length 280
22:51:20.696047 IP 192.168.100.1.67 > 192.168.100.2.68: BOOTP/DHCP, Reply, length 311
22:51:23.700845 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:07:98:11, length 280
22:51:23.700876 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:07:98:11, length 280
22:51:23.701591 IP 192.168.100.1.67 > 192.168.100.2.68: BOOTP/DHCP, Reply, length 311
22:51:26.705978 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:07:98:11, length 280
22:51:26.705995 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:07:98:11, length 280
22:51:26.706527 IP 192.168.100.1.67 > 192.168.100.2.68: BOOTP/DHCP, Reply, length 311
But, if I tcpdump on eth2 on compute, I only see the requests, not the replies:
root@compute1:~# tcpdump -i eth2 -n port 67 or port 68
tcpdump: WARNING: eth2: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes
02:51:20.240672 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:07:98:11, length 280
02:51:23.249758 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:07:98:11, length 280
02:51:26.258281 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:07:98:11, length 280
At this point, I'm stuck. I'm not sure why the DHCP replies aren't making it to the compute node. Perhaps it has something to do with the configuration of the VirtualBox virtual switch/router?
Note that eth2 interfaces on both nodes have been set to promiscuous mode.