Python: slow read & write for millions of small files
- by Jami
I am building directory tree which has tons of subdirectories and files. The total directory count is somewhere along 256^32 subdirectories with 256 files in each end which are only a few bytes long. I did this so I would have fast access to these files (since i'm not searching and i'm just directly accessing then via a known file path)
I have a python script that builds this filesystem and reads & writes those files. The problem is that when I reach more than 1Gb of total filesize, the read and write methods become extremely slow.
Here's the function I have that reads the contents of a file (the file contains an integer string), adds a certain number to it, then writes it back to the original file.
def addInFile(path, scoreToAdd):
num = scoreToAdd
try:
shutil.copyfile(path, '/tmp/tmp.txt')
fp = open('/tmp/tmp.txt', 'r')
num += int(fp.readlines()[0])
fp.close()
except:
pass
fp = open('/tmp/tmp.txt', 'w')
fp.write(str(num))
fp.close()
shutil.copyfile('/tmp/tmp.txt', path)
I previously tried performing linux console commands but it was slower. I copy the file to a temporary file first then access/modify it then copy it back because i found this was faster than directly accessing the file.
I think the cause of the slowdown is because there're tons of files. performing this function 1000 times sometimes reach 1 minute now, but before (when there were only a few files, 1000 calls was performed for only less than 1 second)
How do you suggest I fix this?