How can I test that my hash function is good in terms of max-load?

Posted by philcolbourn on Stack Overflow See other posts from Stack Overflow or by philcolbourn
Published on 2010-04-10T00:37:45Z Indexed on 2010/04/10 0:43 UTC
Read the original article Hit count: 355

Filed under:
|
|

I have read through various papers on the 'Balls and Bins' problem and it seems that if a hash function is working right (ie. it is effectively a random distribution) then the following should/must be true if I hash n values into a hash table with n slots (or bins):

  1. Probability that a bin is empty, for large n is 1/e.
  2. Expected number of empty bins is n/e.
  3. Probability that a bin has k collisions is <= 1/k!.
  4. Probability that a bin has at least k collisions is <= (e/k)**k.

These look easy to check. But the max-load test (the maximum number of collisions with high probability) is usually stated vaguely.

Most texts state that the maximum number of collisions in any bin is O( ln(n) / ln(ln(n)) ). Some say it is 3*ln(n) / ln(ln(n)). Other papers mix ln and log - usually without defining them, or state that log is log base e and then use ln elsewhere.

Is ln the log to base e or 2 and is this max-load formula right and how big should n be to run a test?

This lecture seems to cover it best, but I am no mathematician.

http://pages.cs.wisc.edu/~shuchi/courses/787-F07/scribe-notes/lecture07.pdf

BTW, with high probability seems to mean 1 - 1/n.

© Stack Overflow or respective owner

Related posts about hash

Related posts about hashtable