Save memory in Python. How to iterate over the lines and save them efficiently with a 2million line

Posted by skyl on Stack Overflow See other posts from Stack Overflow or by skyl
Published on 2010-03-13T23:30:18Z Indexed on 2010/03/13 23:35 UTC
Read the original article Hit count: 357

Filed under:

memory-management

I have a tab-separated data file with a little over 2 million lines and 19 columns. You can find it, in US.zip: http://download.geonames.org/export/dump/.

I started to run the following but with for l in f.readlines(). I understand that just iterating over the file is supposed to be more efficient so I'm posting that below. Still, with this small optimization, I'm using 10% of my memory on the process and have only done about 3% of the records. It looks like, at this pace, it will run out of memory like it did before. Also, the function I have is very slow. Is there anything obvious I can do to speed it up? Would it help to del the objects with each pass of the for loop?

def run():
    from geonames.models import POI
    f = file('data/US.txt')
    for l in f:
        li = l.split('\t')
        try:
            p = POI()
            p.geonameid = li[0]
            p.name = li[1]
            p.asciiname = li[2]
            p.alternatenames = li[3]
            p.point = "POINT(%s %s)" % (li[5], li[4])
            p.feature_class = li[6]
            p.feature_code = li[7]
            p.country_code = li[8]
            p.ccs2 = li[9]
            p.admin1_code = li[10]
            p.admin2_code = li[11]
            p.admin3_code = li[12]
            p.admin4_code = li[13]
            p.population = li[14]
            p.elevation = li[15]
            p.gtopo30 = li[16]
            p.timezone = li[17]
            p.modification_date = li[18]
            p.save()
        except IndexError:
            pass

if __name__ == "__main__":
    run()

Developer IT

Save memory in Python. How to iterate over the lines and save them efficiently with a 2million line - Developer IT

Save memory in Python. How to iterate over the lines and save them efficiently with a 2million line

python

files

data

dataset

memory-management

Related posts about python

unmet dependencies in Ubuntu 12.04

How can I get sikuli-ide to work?

Getting PATH right for python after MacPorts install

call python with system() in R to run a python script emulating the python console

Python - Calling a non python program from python?

Related posts about files

Unzip individual files from multiple zip files and extract those individual files to home directory

Storing image files, psd files, ai files, flash in subversion

Copy files in Linux, avoid the copy if files do exist in destination

Best way to convert pdf files to tiff files

what is the difference between ELF files and bin files

Categories cloud