We have two AIX boxes, one for production system and another for testing.
both systems are running ATM machine switches, where the ATM device is connected via TCP socket.
we had an issue on production system where the machine would power off or get disconnected but the netstat -na | grep <IP of machine > would still mention that the socket is up
when simulated that case on the UAT environment, the problem did not happen, where the socket would terminate in 3 to 5 minutes.
when sniffed on the traffic between the machine and ATM we found that no traffic takes place on production while there is some sort of heartbeat on UAT. but it is not initiated by the application.
$>tcpdump | grep -v "10.2.2.71" | grep -v "HSRP" | grep "10.3.1.30"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on en6, link-type 1, capture size 96 bytes
09:08:13.323421 IP server073.afs3-callback > 10.3.1.30.impera: . 278204201:278204202(1) ack 3307884029 win 164
09:08:13.335334 IP 10.3.1.30.impera > server073.afs3-callback: . ack 1 win 64180
09:08:23.425771 IP 10.3.1.30.impera > server073.afs3-callback: . 1:2(1) ack 1 win 64180
09:08:23.425789 IP server073.afs3-callback > 10.3.1.30.impera: . ack 2 win 65535
09:09:13.628985 IP server073.afs3-callback > 10.3.1.30.impera: . 0:1(1) ack 1 win 164
09:09:13.633900 IP 10.3.1.30.impera > server073.afs3-callback: . ack 1 win 64180
09:09:23.373634 IP 10.3.1.30.impera > server073.afs3-callback: . 1:2(1) ack 1 win 64180
09:09:23.373647 IP server073.afs3-callback > 10.3.1.30.impera: . ack 2 win 65535
while on production, that traffic is not there.
we want to know where this traffic is initiated from to implement on production to sense disconnection
our comms parameters are:
tcp_keepcnt = 2
tcp_keepidle = 100
tcp_keepinit = 150
tcp_keepintvl = 150
tcp_finwait2 = 1200
can anyone help?
Editing Question:
One point I missed because I was rushing to a meeting.
the difference between the Production and UAT in setup is that in Production we have an application called F5 working as load balancer between the ATMs and the AIX box, while it is a direct connection through MPLS in case of UAT.
note: we had one MPLS and one GPRS connected ATMs on UAT, and both connections terminated when unplugged in about 4 minutes
Edit 2
the no -o tcp_timewait command returns 1 in both Production and UAT