We are facing random spikes in high throughput transaction processing system using sockets for IPC.
Below is the setup used for the run:
The client opens and closes new connection for every transaction, and there are 4 exchanges between the server and the client.
We have disabled the TIME_WAIT, by setting the socket linger (SO_LINGER) option via getsockopt as we thought that the spikes were caused due to the sockets waiting in TIME_WAIT.
There is no processing done for the transaction. Only messages are passed.
OS used Centos 5.4
The average round trip time is around 3 milli seconds, but some times the round trip time ranges from 100 milli seconds to couple of seconds.
Steps used for Execution and Measurement and output
Starting the server
$ python sockServerLinger.py /dev/null &
Starting the client to post 1 million transactions to the server. And logs the time for a transaction in the client.log file.
$ python sockClient.py 1000000 client.log
Once the execution finishes the following command will show the execution time greater than 100 milliseconds in the format <line_number>:<execution_time>.
$ grep -n "0.[1-9]" client.log | less
Below is the example code for Server and Client.
Server
# File: sockServerLinger.py
import socket, traceback,time
import struct
host = ''
port = 9999
l_onoff = 1
l_linger = 0
lingeropt = struct.pack('ii', l_onoff, l_linger)
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, lingeropt)
s.bind((host, port))
s.listen(1)
while 1:
try:
clientsock, clientaddr = s.accept()
print "Got connection from", clientsock.getpeername()
data = clientsock.recv(1024*1024*10)
#print "asdasd",data
numsent=clientsock.send(data)
data1 = clientsock.recv(1024*1024*10)
numsent=clientsock.send(data)
ret = 1
while(ret>0):
data1 = clientsock.recv(1024*1024*10)
ret = len(data)
clientsock.close()
except KeyboardInterrupt:
raise
except:
print traceback.print_exc()
continue
Client
# File: sockClient.py
import socket, traceback,sys
import time
i = 0
while 1:
try:
st = time.time()
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
while (s.connect_ex(('127.0.0.1',9999)) != 0):
continue
numsent=s.send("asd"*1000)
response = s.recv(6000)
numsent=s.send("asd"*1000)
response = s.recv(6000)
i+=1
if i == int(sys.argv[1]):
break
except KeyboardInterrupt:
raise
except:
print "in exec:::::::::::::",traceback.print_exc()
continue
print time.time() -st