tuple optimization)?

Posted by user89 on Stack Overflow See other posts from Stack Overflow or by user89
Published on 2014-05-31T02:59:09Z Indexed on 2014/05/31 3:26 UTC
Read the original article Hit count: 205

Filed under:

I repeat the following idiom again and again. I read from a large file (sometimes, up to 1.2 million records!) and store the output into an SQLite databse. Putting stuff into the SQLite DB seems to be fairly fast.

def readerFunction(recordSize, recordFormat, connection, outputDirectory, outputFile, numObjects):

insertString = "insert into NODE_DISP_INFO(node, analysis, timeStep, H1_translation, H2_translation, V_translation, H1_rotation, H2_rotation, V_rotation) values (?, ?, ?, ?, ?, ?, ?, ?, ?)" 

analysisNumber = int(outputPath[-3:])

outputFileObject = open(os.path.join(outputDirectory, outputFile), "rb")
outputFileObject, numberOfRecordsInFileObject = determineNumberOfRecordsInFileObjectGivenRecordSize(recordSize, outputFileObject)

numberOfRecordsPerObject = (numberOfRecordsInFileObject//numberOfObjects)

loop1StartTime = time.time()
for i in range(numberOfRecordsPerObject ):  
    processedRecords = []

    loop2StartTime = time.time()

    for j in range(numberOfObjects):
        fout = outputFileObject .read(recordSize)

        processedRecords.append(tuple([j+1, analysisNumber, i] + [x for x in list(struct.unpack(recordFormat, fout))]))

    loop2EndTime = time.time()
    print "Time taken to finish loop2: {}".format(loop2EndTime-loop2StartTime)  

    dbInsertStartTime = time.time()
    connection.executemany(insertString, processedRecords)
    dbInsertEndTime = time.time()

loop1EndTime = time.time()
print "Time taken to finish loop1: {}".format(loop1EndTime-loop1StartTime)

outputFileObject.close()
print "Finished reading output file for analysis {}...".format(analysisNumber)

When I run the code, it seems that "loop 2" and "inserting into the database" is where most execution time is spent. Average "loop 2" time is 0.003s, but it is run up to 50,000 times, in some analyses. The time spent putting stuff into the database is about the same: 0.004s. Currently, I am inserting into the database every time after loop2 finishes so that I don't have to deal with running out RAM.

What could I do to speed up "loop 2"?

Developer IT

Python2.7: How can I speed up this bit of code (loop/lists/tuple optimization)? - Developer IT

Python2.7: How can I speed up this bit of code (loop/lists/tuple optimization)?

loops

python-2.7

optimization

Related posts about loops

C++ behavior of for loops vs. while loops

While loops within while loops and output php?

Loops inside loops

Mulitple full joins in Postgres is slow

PL/SQL Practices - "On Cursor FOR Loops"

Related posts about python-2.7

unmet dependencies in Ubuntu 12.04

How can I get sikuli-ide to work?

Getting PATH right for python after MacPorts install

call python with system() in R to run a python script emulating the python console

Python - Calling a non python program from python?

Categories cloud