Python MD5 Hash Faster Calculation

Posted by balgan on Stack Overflow See other posts from Stack Overflow or by balgan
Published on 2010-05-11T19:00:25Z Indexed on 2010/05/11 19:04 UTC
Read the original article Hit count: 380

Hi everyone. I will try my best to explain my problem and my line of thought on how I think I can solve it.

I use this code

    for root, dirs, files in os.walk(downloaddir):
for infile in files:
    f = open(os.path.join(root,infile),'rb')
    filehash = hashlib.md5()
    while True:
        data = f.read(10240)
        if len(data) == 0:
            break
        filehash.update(data)
    print "FILENAME: " , infile
    print "FILE HASH: " , filehash.hexdigest()

and using start = time.time() elapsed = time.time() - start I measure how long it takes to calculate an hash. Pointing my code to a file with 653megs this is the result:

root@Mars:/home/tiago# python algorithm-timer.py FILENAME: freebsd.iso FILE HASH: ace0afedfa7c6e0ad12c77b6652b02ab 12.624 root@Mars:/home/tiago# python algorithm-timer.py FILENAME: freebsd.iso FILE HASH: ace0afedfa7c6e0ad12c77b6652b02ab 12.373 root@Mars:/home/tiago# python algorithm-timer.py FILENAME: freebsd.iso FILE HASH: ace0afedfa7c6e0ad12c77b6652b02ab 12.540

Ok now 12 seconds +- on a 653mb file, my problem is I intend to use this code on a program that will run through multiple files, some of them might be 4/5/6Gb and it will take wayy longer to calculate. What am wondering is if there is a faster way for me to calculate the hash of the file? Maybe by doing some multithreading? I used a another script to check the use of the CPU second by second and I see that my code is only using 1 out of my 2 CPUs and only at 25% max, any way I can change this?

Thank you all in advance for the given help.

© Stack Overflow or respective owner

Related posts about python

Related posts about md5