removing duplicate strings from a massive array in java efficiently?
Posted
by
Preator Darmatheon
on Stack Overflow
See other posts from Stack Overflow
or by Preator Darmatheon
Published on 2012-04-06T15:29:24Z
Indexed on
2012/04/06
23:29 UTC
Read the original article
Hit count: 354
java
|interview-questions
I'm considering the best possible way to remove duplicates from an (Unsorted) array of strings - the array contains millions or tens of millions of stringz..The array is already prepopulated so the optimization goal is only on removing dups and not preventing dups from initially populating!!
I was thinking along the lines of doing a sort and then binary search to get a log(n) search instead of n (linear) search. This would give me nlogn + n searches which althout is better than an unsorted (n^2) search = but this still seems slow. (Was also considering along the lines of hashing but not sure about the throughput)
Please help! Looking for an efficient solution that addresses both speed and memory since there are millions of strings involved without using Collections API!
© Stack Overflow or respective owner