Reasonably faster way to traverse a directory tree in Python?
- by Sridhar Ratnakumar
Assuming that the given directory tree is of reasonable size: say an open source project like Twisted or Python, what is the fastest way to traverse and iterate over the absolute path of all files/directories inside that directory?
I want to do this from within Python (subprocess is allowed). os.path.walk is slow. So I tried ls -lR and tree -fi. For a project with about 8337 files (including tmp, pyc, test, .svn files):
$ time tree -fi > /dev/null
real 0m0.170s
user 0m0.044s
sys 0m0.123s
$ time ls -lR > /dev/null
real 0m0.292s
user 0m0.138s
sys 0m0.152s
$ time find . > /dev/null
real 0m0.074s
user 0m0.017s
sys 0m0.056s
$
tree appears to be faster than ls -lR (though ls -R is faster than tree, but it does not give full paths). find is the fastest.
Can anyone think of a faster and/or better approach? On Windows, I may simply ship a 32-bit binary tree.exe or ls.exe if necessary.
Update 1: Added find