Handling close-to-impossible collisions on should-be-unique values
Posted
by balpha
on Stack Overflow
See other posts from Stack Overflow
or by balpha
Published on 2009-07-08T12:52:01Z
Indexed on
2010/04/07
5:23 UTC
Read the original article
Hit count: 249
There are many systems that depend on the uniqueness of some particular value. Anything that uses GUIDs comes to mind (eg. the Windows registry or other databases), but also things that create a hash from an object to identify it and thus need this hash to be unique.
A hash table usually doesn't mind if two objects have the same hash because the hashing is just used to break down the objects into categories, so that on lookup, not all objects in the table, but only those objects in the same category (bucket) have to be compared for identity to the searched object.
Other implementations however (seem to) depend on the uniqueness. My example (that's what lead me to asking this) is Mercurial's revision IDs. An entry on the Mercurial mailing list correctly states
The odds of the changeset hash colliding by accident in your first billion commits is basically zero. But we will notice if it happens. And you'll get to be famous as the guy who broke SHA1 by accident.
But even the tiniest probability doesn't mean impossible. Now, I don't want an explanation of why it's totally okay to rely on the uniqueness (this has been discussed here for example). This is very clear to me.
Rather, I'd like to know (maybe by means of examples from your own work):
Are there any best practices as to covering these improbable cases anyway?
Should they be ignored, because it's more likely that particularly strong solar winds lead to faulty hard disk reads?
Should they at least be tested for, if only to fail with a "I give up, you have done the impossible" message to the user?
Or should even these cases get handled gracefully?
For me, especially the following are interesting, although they are somewhat touchy-feely:
If you don't handle these cases, what do you do against gut feelings that don't listen to probabilities?
If you do handle them, how do you justify this work (to yourself and others), considering there are more probable cases you don't handle, like a supernonva?
© Stack Overflow or respective owner