Faster Insertion of Records into a Table with SQLAlchemy

Posted by Kyle Brandt on Stack Overflow See other posts from Stack Overflow or by Kyle Brandt
Published on 2010-05-21T12:12:52Z Indexed on 2010/05/21 12:30 UTC
Read the original article Hit count: 203

Filed under:
|
|
|
|

I am parsing a log and inserting it into either MySQL or SQLite using SQLAlchemy and Python. Right now I open a connection to the DB, and as I loop over each line, I insert it after it is parsed (This is just one big table right now, not very experienced with SQL). I then close the connection when the loop is done. The summarized code is:

log_table = schema.Table('log_table', metadata,
                         schema.Column('id', types.Integer, primary_key=True),
                         schema.Column('time', types.DateTime),
                         schema.Column('ip', types.String(length=15))
....
engine = create_engine(...)
metadata.bind = engine
connection = engine.connect()
....
for line in file_to_parse:
    m = line_regex.match(line)
    if m:
        fields = m.groupdict()
        pythonified = pythoninfy_log(fields) #Turn them into ints, datatimes, etc
        if use_sql:
            ins = log_table.insert(values=pythonified)
            connection.execute(ins)
            parsed += 1

My two questions are:

  • Is there a way to speed up the inserts within this basic framework? Maybe have a Queue of inserts and some insertion threads, some sort of bulk inserts, etc?
  • When I used MySQL, for about ~1.2 million records the insert time was 15 minutes. With SQLite, the insert time was a little over an hour. Does that time difference between the db engines seem about right, or does it mean I am doing something very wrong?

© Stack Overflow or respective owner

Related posts about python

Related posts about sqlalchemy