Python MD5 Hash Faster Calculation

Posted by balgan on Stack Overflow See other posts from Stack Overflow or by balgan
Published on 2010-05-11T19:00:25Z Indexed on 2010/05/11 19:04 UTC
Read the original article Hit count: 477

Filed under:

multicore

Hi everyone. I will try my best to explain my problem and my line of thought on how I think I can solve it.

I use this code

    for root, dirs, files in os.walk(downloaddir):
for infile in files:
    f = open(os.path.join(root,infile),'rb')
    filehash = hashlib.md5()
    while True:
        data = f.read(10240)
        if len(data) == 0:
            break
        filehash.update(data)
    print "FILENAME: " , infile
    print "FILE HASH: " , filehash.hexdigest()

and using start = time.time() elapsed = time.time() - start I measure how long it takes to calculate an hash. Pointing my code to a file with 653megs this is the result:

root@Mars:/home/tiago# python algorithm-timer.py FILENAME: freebsd.iso FILE HASH: ace0afedfa7c6e0ad12c77b6652b02ab 12.624 root@Mars:/home/tiago# python algorithm-timer.py FILENAME: freebsd.iso FILE HASH: ace0afedfa7c6e0ad12c77b6652b02ab 12.373 root@Mars:/home/tiago# python algorithm-timer.py FILENAME: freebsd.iso FILE HASH: ace0afedfa7c6e0ad12c77b6652b02ab 12.540

Ok now 12 seconds +- on a 653mb file, my problem is I intend to use this code on a program that will run through multiple files, some of them might be 4/5/6Gb and it will take wayy longer to calculate. What am wondering is if there is a faster way for me to calculate the hash of the file? Maybe by doing some multithreading? I used a another script to check the use of the CPU second by second and I see that my code is only using 1 out of my 2 CPUs and only at 25% max, any way I can change this?

Thank you all in advance for the given help.

Developer IT

Python MD5 Hash Faster Calculation - Developer IT

Python MD5 Hash Faster Calculation

python

md5

threading

multithreading

multicore

Related posts about python

unmet dependencies in Ubuntu 12.04

How can I get sikuli-ide to work?

Getting PATH right for python after MacPorts install

call python with system() in R to run a python script emulating the python console

Python - Calling a non python program from python?

Related posts about md5

Getting Oracle's MD5 to match PHP's MD5

Python: How to display the calculated MD5 value in my browser?

How are hash functions like MD5 unique?

Converting a md5 hash byte array to a string

How do I compare the md5sum of a file with the md5 file (that was available to download with the file)?

Categories cloud