How does git fetches commits associated to a file ?

Posted by liadan on Stack Overflow See other posts from Stack Overflow or by liadan
Published on 2010-05-15T21:40:01Z Indexed on 2010/05/15 21:44 UTC
Read the original article Hit count: 170

Filed under:
|

I'm writing a simple parser of .git/* files. I covered almost everything, like objects, refs, pack files etc. But I have a problem. Let's say I have a big 300M repository (in a pack file) and I want to find out all the commits which changed /some/deep/inside/file file. What I'm doing now is:

  • fetching last commit
  • finding a file in it by:
    • fetching parent tree
    • finding out a tree inside
    • recursively repeat until I get into the file
    • additionally I'm checking hashes of each subfolders on my way to file. If one of them is the same as in commit before, I assume that file was not changed (because it's parent dir didn't change)
  • then I store the hash of a file and fetch parent commit
  • finding file again and check if hash change occurs
    • if yes then original commit (i.e. one before parent) was changing a file

And I repeat it over and over until I reach very first commit.

This solution works, but it sucks. In worse case scenario, first search can take even 3 minutes (for 300M pack).

Is there any way to speed it up ? I tried to avoid putting so large objects in memory, but right now I don't see any other way. And even that, initial memory load will take forever :(

Greets and thanks for any help!

© Stack Overflow or respective owner

Related posts about git

Related posts about python