Python2.7: How can I speed up this bit of code (loop/lists/tuple optimization)?
Posted
by
user89
on Stack Overflow
See other posts from Stack Overflow
or by user89
Published on 2014-05-31T02:59:09Z
Indexed on
2014/05/31
3:26 UTC
Read the original article
Hit count: 126
I repeat the following idiom again and again. I read from a large file (sometimes, up to 1.2 million records!) and store the output into an SQLite databse. Putting stuff into the SQLite DB seems to be fairly fast.
def readerFunction(recordSize, recordFormat, connection, outputDirectory, outputFile, numObjects):
insertString = "insert into NODE_DISP_INFO(node, analysis, timeStep, H1_translation, H2_translation, V_translation, H1_rotation, H2_rotation, V_rotation) values (?, ?, ?, ?, ?, ?, ?, ?, ?)"
analysisNumber = int(outputPath[-3:])
outputFileObject = open(os.path.join(outputDirectory, outputFile), "rb")
outputFileObject, numberOfRecordsInFileObject = determineNumberOfRecordsInFileObjectGivenRecordSize(recordSize, outputFileObject)
numberOfRecordsPerObject = (numberOfRecordsInFileObject//numberOfObjects)
loop1StartTime = time.time()
for i in range(numberOfRecordsPerObject ):
processedRecords = []
loop2StartTime = time.time()
for j in range(numberOfObjects):
fout = outputFileObject .read(recordSize)
processedRecords.append(tuple([j+1, analysisNumber, i] + [x for x in list(struct.unpack(recordFormat, fout))]))
loop2EndTime = time.time()
print "Time taken to finish loop2: {}".format(loop2EndTime-loop2StartTime)
dbInsertStartTime = time.time()
connection.executemany(insertString, processedRecords)
dbInsertEndTime = time.time()
loop1EndTime = time.time()
print "Time taken to finish loop1: {}".format(loop1EndTime-loop1StartTime)
outputFileObject.close()
print "Finished reading output file for analysis {}...".format(analysisNumber)
When I run the code, it seems that "loop 2" and "inserting into the database" is where most execution time is spent. Average "loop 2" time is 0.003s, but it is run up to 50,000 times, in some analyses. The time spent putting stuff into the database is about the same: 0.004s. Currently, I am inserting into the database every time after loop2 finishes so that I don't have to deal with running out RAM.
What could I do to speed up "loop 2"?
© Stack Overflow or respective owner