Ubuntu 12 crashed and took down network
Posted
by
Leopd
on Server Fault
See other posts from Server Fault
or by Leopd
Published on 2012-07-09T04:03:24Z
Indexed on
2012/09/17
21:41 UTC
Read the original article
Hit count: 410
We recently set up a new Ubuntu 12.04LTS server on our network. It's not fully configured so it's not doing much beyond sshd
and a default apache2
install. But this evening it appears to have crashed. It wasn't responding to the network or the keyboard. But the worst part is, it took down the entire network.
My knowledge of the network stack below OSI layer 3 is very limited, so the rest confuses me. When this machine was physically connected to the network, no other machine could connect to the outside internet. When things were broken, running arp
showed that our gateway's IP address (10.0.1.1
) was listed as "invalid." Unplugging the server from the network fixed the problem, and plugging it back in broke it again. So the crashed server was advertising itself as owning the gateway's IP address?
There's nothing at all in syslog
during the time when it was causing problems. Any ideas about how to figure out what went wrong or what we can do to prevent it from happening again? I'm hesitant to even put the machine back on the network right now.
Update **
It crashed again, and I ran tcpdump -penn arp
(thanks bahamat!) for several minutes and got this... (timestamps and duplicate lines removed)
00:1e:65:f8:dc:24 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 10.0.1.1 tell 10.0.2.191, length 46
00:1e:65:f8:dc:24 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 10.0.1.44 tell 10.0.2.191, length 46
60:d8:19:d4:71:d6 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 10.0.1.1 tell 10.0.2.125, length 46
d4:9a:20:04:e9:78 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.1.1 tell 192.168.1.100, length 28
Update 2 **
When the network is functioning properly, arping -c4 10.0.1.1
returns this:
ARPING 10.0.1.1
60 bytes from c0:c1:c0:77:25:8e (10.0.1.1): index=0 time=267.982 usec
60 bytes from c0:c1:c0:77:25:8e (10.0.1.1): index=1 time=422.955 usec
60 bytes from c0:c1:c0:77:25:8e (10.0.1.1): index=2 time=299.215 usec
60 bytes from c0:c1:c0:77:25:8e (10.0.1.1): index=3 time=366.926 usec
--- 10.0.1.1 statistics ---
4 packets transmitted, 4 packets received, 0% unanswered (0 extra)
When the bad server is plugged in, arping -c4 10.0.1.1
returns:
ARPING 10.0.1.1
--- 10.0.1.1 statistics ---
4 packets transmitted, 0 packets received, 100% unanswered (0 extra)
Context **
10.0.x.x
is the main subnet.10.0.1.1
is the main internet gateway10.0.1.44
is a printer10.0.2.*
devices are all laptops / workstations- I have no idea what's using the
192.168.x.x
subnet -- your guesses are at least as good as mine. A VM on a workstation? A misconfigured WAP? Somebody re-sharing wifi? A machine that failed to DHCP? - The offending ubuntu server's MAC address ends in
cd:80
so isn't listed in the dump. It should DHCP to10.0.3.3
Thanks for any help. This ARP stuff is all voodoo to me. Packets just go to IP addresses, right? ;)
© Server Fault or respective owner