Python2.7: How can I speed up this bit of code (loop/lists/tuple optimization)?

Posted by user89 on Stack Overflow See other posts from Stack Overflow or by user89
Published on 2014-05-31T02:59:09Z Indexed on 2014/05/31 3:26 UTC
Read the original article Hit count: 126

Filed under:
|
|

I repeat the following idiom again and again. I read from a large file (sometimes, up to 1.2 million records!) and store the output into an SQLite databse. Putting stuff into the SQLite DB seems to be fairly fast.

def readerFunction(recordSize, recordFormat, connection, outputDirectory, outputFile, numObjects):

insertString = "insert into NODE_DISP_INFO(node, analysis, timeStep, H1_translation, H2_translation, V_translation, H1_rotation, H2_rotation, V_rotation) values (?, ?, ?, ?, ?, ?, ?, ?, ?)" 

analysisNumber = int(outputPath[-3:])

outputFileObject = open(os.path.join(outputDirectory, outputFile), "rb")
outputFileObject, numberOfRecordsInFileObject = determineNumberOfRecordsInFileObjectGivenRecordSize(recordSize, outputFileObject)

numberOfRecordsPerObject = (numberOfRecordsInFileObject//numberOfObjects)

loop1StartTime = time.time()
for i in range(numberOfRecordsPerObject ):  
    processedRecords = []

    loop2StartTime = time.time()

    for j in range(numberOfObjects):
        fout = outputFileObject .read(recordSize)

        processedRecords.append(tuple([j+1, analysisNumber, i] + [x for x in list(struct.unpack(recordFormat, fout))]))

    loop2EndTime = time.time()
    print "Time taken to finish loop2: {}".format(loop2EndTime-loop2StartTime)  

    dbInsertStartTime = time.time()
    connection.executemany(insertString, processedRecords)
    dbInsertEndTime = time.time()

loop1EndTime = time.time()
print "Time taken to finish loop1: {}".format(loop1EndTime-loop1StartTime)

outputFileObject.close()
print "Finished reading output file for analysis {}...".format(analysisNumber)

When I run the code, it seems that "loop 2" and "inserting into the database" is where most execution time is spent. Average "loop 2" time is 0.003s, but it is run up to 50,000 times, in some analyses. The time spent putting stuff into the database is about the same: 0.004s. Currently, I am inserting into the database every time after loop2 finishes so that I don't have to deal with running out RAM.

What could I do to speed up "loop 2"?

© Stack Overflow or respective owner

Related posts about loops

Related posts about python-2.7