How can I test that my hash function is good in terms of max-load?
Posted
by philcolbourn
on Stack Overflow
See other posts from Stack Overflow
or by philcolbourn
Published on 2010-04-10T00:37:45Z
Indexed on
2010/04/10
0:43 UTC
Read the original article
Hit count: 363
I have read through various papers on the 'Balls and Bins' problem and it seems that if a hash function is working right (ie. it is effectively a random distribution) then the following should/must be true if I hash n
values into a hash table with n
slots (or bins):
- Probability that a bin is empty, for large
n
is1/e
. - Expected number of empty bins is
n/e
. - Probability that a bin has
k
collisions is<= 1/k!
. - Probability that a bin has at least k collisions is
<= (e/k)**k
.
These look easy to check. But the max-load
test (the maximum number of collisions with high probability) is usually stated vaguely.
Most texts state that the maximum number of collisions in any bin is O( ln(n) / ln(ln(n)) )
.
Some say it is 3*ln(n) / ln(ln(n))
. Other papers mix ln
and log
- usually without defining them, or state that log
is log base e and then use ln
elsewhere.
Is ln
the log to base e
or 2
and is this max-load
formula right and how big should n
be to run a test?
This lecture seems to cover it best, but I am no mathematician.
http://pages.cs.wisc.edu/~shuchi/courses/787-F07/scribe-notes/lecture07.pdf
BTW, with high probability
seems to mean 1 - 1/n
.
© Stack Overflow or respective owner