Python: slow read & write for millions of small files

Posted by Jami on Stack Overflow See other posts from Stack Overflow or by Jami
Published on 2010-06-13T08:31:16Z Indexed on 2010/06/13 8:32 UTC
Read the original article Hit count: 315

Filed under:
|
|
|
|

I am building directory tree which has tons of subdirectories and files. The total directory count is somewhere along 256^32 subdirectories with 256 files in each end which are only a few bytes long. I did this so I would have fast access to these files (since i'm not searching and i'm just directly accessing then via a known file path)

I have a python script that builds this filesystem and reads & writes those files. The problem is that when I reach more than 1Gb of total filesize, the read and write methods become extremely slow.

Here's the function I have that reads the contents of a file (the file contains an integer string), adds a certain number to it, then writes it back to the original file.

def addInFile(path, scoreToAdd):
    num = scoreToAdd
    try:
        shutil.copyfile(path, '/tmp/tmp.txt')
        fp = open('/tmp/tmp.txt', 'r')
        num += int(fp.readlines()[0])
        fp.close()
    except:
        pass
    fp = open('/tmp/tmp.txt', 'w')
    fp.write(str(num))
    fp.close()
    shutil.copyfile('/tmp/tmp.txt', path)

I previously tried performing linux console commands but it was slower. I copy the file to a temporary file first then access/modify it then copy it back because i found this was faster than directly accessing the file.

I think the cause of the slowdown is because there're tons of files. performing this function 1000 times sometimes reach 1 minute now, but before (when there were only a few files, 1000 calls was performed for only less than 1 second)

How do you suggest I fix this?

© Stack Overflow or respective owner

Related posts about python

Related posts about file