Finding cause of TCP retransmission within a LAN
- by Surreal
Hello denizens of Server Fault
I have an irritating problem with a LAN of about 100 computers, 2 Windows domain servers, and 12 VoIP phones. Since their installation around a year ago, every week or so, we notice a VoIP phone resetting itself - occasionally in the middle of a call. Simultaneously there are often signs of temporary loss of connection on computers: freezes in explorer while accessing network shares, errors in our administration software due to loss of connection to the database server.
I have been doing some Wireshark monitoring on the connection between the VoIP PBX and the rest of the network. Wireshark picks up a clump of retransmitted TCP packets at the times when we record phone restarts. The Wireshark log shows about 2 clusters of retransmissions a day ranging from 5 packets to hundreds. Those in each cluster are mainly between the PBX and some set of the VoIP phones, but not always the same set. Often retransmissions at the same time are to phones connected to the same switch, but sometimes retransmissions occur together to phones at opposite ends of the network. There are usually some coincident retransmissions in passing TCP traffic, for example between client machines and the file servers.
The spikes in retransmissions and phone resets do not correlate well with when the network is heavily loaded. They seem to occur slightly more during the day, but most in the evening, when traffic should be decreasing. They occur reasonably often late at night when most computers are turned off and traffic should be lowest.
Do you have any ideas that might help diagnose the cause of problems like this?
One thing I have not yet tried, but should have, is updating the firmware of all the switches.