Sorting versus hashing
Posted
by
Paul Siegel
on Programmers
See other posts from Programmers
or by Paul Siegel
Published on 2013-08-02T15:03:50Z
Indexed on
2013/08/02
16:02 UTC
Read the original article
Hit count: 363
My problem is as follows. I have an array of n
strings with m < n
of them distinct. I want to create a one-to-one function which assigns each of the m
distinct strings to the numbers 0 ... m-1
. For example, if my strings are:
Bob, Amy, Bob, Charlie, Amy
then the function:
Bob -> 0, Amy -> 1, Charlie -> 2
would meet my needs. I have thought of three possible approaches:
Sort the list of strings, remove duplicates, and construct the function using a search algorithm.
Create a hash table and check each string to see if it is already in the table before inserting it.
Sort the list of strings, remove duplicates, and put the resulting list into a hash table.
My code will be written in Java, and I will likely use standard Java algorithms: merge sort for sorting, binary search for searching, and whatever the standard Java hash table algorithm is.
Question: Assume that after creating the function I will have to evaluate it on each of the n original strings. Which of the three approaches is fastest? Is there a better way?
Part of the problem is that I don't really know what's going on "under the hood" in standard hashing algorithms. Any help would be appreciated.
© Programmers or respective owner