Searching through large data set
Posted
by calccrypto
on Stack Overflow
See other posts from Stack Overflow
or by calccrypto
Published on 2010-05-17T21:23:01Z
Indexed on
2010/05/17
22:50 UTC
Read the original article
Hit count: 288
how would i search through a list with ~5 mil 128bit (or 256, depending on how you look at it) strings quickly and find the duplicates (in python)? i can turn the strings into numbers, but i don't think that's going to help much. since i haven't learned much information theory, is there anything about this in information theory?
and since these are hashes already, there's no point in hashing them again
© Stack Overflow or respective owner