Reasonably faster way to traverse a directory tree in Python?

Posted by Sridhar Ratnakumar on Stack Overflow See other posts from Stack Overflow or by Sridhar Ratnakumar
Published on 2010-04-22T23:04:59Z Indexed on 2010/04/22 23:43 UTC
Read the original article Hit count: 247

Assuming that the given directory tree is of reasonable size: say an open source project like Twisted or Python, what is the fastest way to traverse and iterate over the absolute path of all files/directories inside that directory?

I want to do this from within Python (subprocess is allowed). os.path.walk is slow. So I tried ls -lR and tree -fi. For a project with about 8337 files (including tmp, pyc, test, .svn files):

$ time tree -fi > /dev/null 

real    0m0.170s
user    0m0.044s
sys     0m0.123s

$ time ls -lR > /dev/null 

real    0m0.292s
user    0m0.138s
sys     0m0.152s

$ time find . > /dev/null 

real    0m0.074s
user    0m0.017s
sys     0m0.056s
$

tree appears to be faster than ls -lR (though ls -R is faster than tree, but it does not give full paths). find is the fastest.

Can anyone think of a faster and/or better approach? On Windows, I may simply ship a 32-bit binary tree.exe or ls.exe if necessary.

Update 1: Added find

© Stack Overflow or respective owner

Related posts about python

Related posts about file-traversal