What hash algorithms are paralellizable? Optimizing the hashing of large files utilizing on mult-co

Posted by DanO on Stack Overflow See other posts from Stack Overflow or by DanO
Published on 2010-04-26T21:56:48Z Indexed on 2010/04/26 22:03 UTC
Read the original article Hit count: 257

Filed under:
|
|
|
|

I'm interested in optimizing the hashing of some large files (optimizing wall clock time). The I/O has been optimized well enough already and the I/O device (local SSD) is only tapped at about 25% of capacity, while one of the CPU cores is completely maxed-out.

I have more cores available, and in the future will likely have even more cores. So far I've only been able to tap into more cores if I happen to need multiple hashes of the same file, say an MD5 AND a SHA256 at the same time. I can use the same I/O stream to feed two or more hash algorithms, and I get the faster algorithms done for free (as far as wall clock time). As I understand most hash algorithms, each new bit changes the entire result, and it is inherently challenging/impossible to do in parallel.

Are any of the mainstream hash algorithms parallelizable?
Are there any non-mainstream hashes that are parallelizable (and that have at least a sample implementation available)?

As future CPUs will trend toward more cores and a leveling off in clock speed, is there any way to improve the performance of file hashing? (other than liquid nitrogen cooled overclocking?) or is it inherently non-parallelizable?

© Stack Overflow or respective owner

Related posts about hash

Related posts about md5