arp -n responds with (incomplete) on the wrong subnet, can't remove it
- by Hannes
context
There are 2 servers:
server1 - eth0 10.129.76.16
eth0.2 192.168.0.103
server2 - eth0 10.129.79.1
eth0.2 192.168.62.101
The 192.x.x.x addresses are connected to the same vlan (vlan2) and are able to see eachother.
The 10.x.x.x addresses are connected to different vlan's which are not able to see eachother.
on request of David Swartz:
the routing table on server 1 is:
~$ sudo route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.129.76.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.0.0 0.0.0.0 255.255.192.0 U 0 0 0 eth0.2
0.0.0.0 192.168.61.254 0.0.0.0 UG 100 0 0 eth0.2
the routing table on server 2 is:
~$ sudo route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 <public IP gw> 0.0.0.0 UG 100 0 0 eth0.11
10.129.79.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
<public IP> 0.0.0.0 255.255.255.128 U 0 0 0 eth0.11
192.168.0.0 0.0.0.0 255.255.192.0 U 0 0 0 eth0.2
Problem:
When I ping from server 1 to server 2, it seems no packets are arriving and vice versa.
When I check the routes (route -n) I see the default gw uses eth0.2 on both servers. But when I use arping, I get a response one way (from server 2 to server 1) but no response vice versa.
arping 192.168.62.101
ARPING 192.168.62.101 from 10.129.76.16 eth0
^CSent 2 probes (2 broadcast(s))
Received 0 response(s)
As you can see it uses the 10.x.x.x address instead of the 192.x.x.x. And as I told before, the 10.x.x.x address is unreachable from the other server.
When I force arping to use eth0.2, it does work.
I don't have any problems with ping'ing other servers from any of those 2 servers.
I did see this in the arp tables:
~# arp -n | grep 192.168.0.103
192.168.0.103 (incomplete) eth0
and
~# arp -n | grep 192.168.62.101
Question
quite obvious... How can I make these servers see each other again?
Things I've tied
clear the apropriate entries in the arptable and tried to get rid of the (incomplete) But I think the biggest problem is that eth0 is used instead of eth0.2 for the packets from server 1 to server 2
Because of David Swartz' remark about the routing tables, I added a route in there defining the host.
I added
192.168.0.103 0.0.0.0 255.255.255.255 UH 0 0 0 eth0.2
and
192.168.62.101 0.0.0.0 255.255.255.255 UH 0 0 0 eth0.2
to the appropriate servers but this didn't solve the problem so I presume the problem is not in the routing.
My guess
I guess the problem lies in the following.
~$ arp -n | grep 192.168.0.103
192.168.0.103 (incomplete) eth0
but I'm unable to remove this entry. (arp -d 192.168.0.103 has no effect)
Thanks for reading and even more thanks for answering!