How can I test that my hash function is good in terms of max-load?
Posted
by philcolbourn
on Stack Overflow
See other posts from Stack Overflow
or by philcolbourn
Published on 2010-04-10T00:37:45Z
Indexed on
2010/04/10
0:43 UTC
Read the original article
Hit count: 438
I have read through various papers on the 'Balls and Bins' problem and it seems that if a hash function is working right (ie. it is effectively a random distribution) then the following should/must be true if I hash n values into a hash table with n slots (or bins):
- Probability that a bin is empty, for large
nis1/e. - Expected number of empty bins is
n/e. - Probability that a bin has
kcollisions is<= 1/k!. - Probability that a bin has at least k collisions is
<= (e/k)**k.
These look easy to check. But the max-load test (the maximum number of collisions with high probability) is usually stated vaguely.
Most texts state that the maximum number of collisions in any bin is O( ln(n) / ln(ln(n)) ).
Some say it is 3*ln(n) / ln(ln(n)). Other papers mix ln and log - usually without defining them, or state that log is log base e and then use ln elsewhere.
Is ln the log to base e or 2 and is this max-load formula right and how big should n be to run a test?
This lecture seems to cover it best, but I am no mathematician.
http://pages.cs.wisc.edu/~shuchi/courses/787-F07/scribe-notes/lecture07.pdf
BTW, with high probability seems to mean 1 - 1/n.
© Stack Overflow or respective owner