removing duplicate strings from a massive array in java efficiently?

Posted by Preator Darmatheon on Stack Overflow See other posts from Stack Overflow or by Preator Darmatheon
Published on 2012-04-06T15:29:24Z Indexed on 2012/04/06 23:29 UTC
Read the original article Hit count: 349

Filed under:
|

I'm considering the best possible way to remove duplicates from an (Unsorted) array of strings - the array contains millions or tens of millions of stringz..The array is already prepopulated so the optimization goal is only on removing dups and not preventing dups from initially populating!!

I was thinking along the lines of doing a sort and then binary search to get a log(n) search instead of n (linear) search. This would give me nlogn + n searches which althout is better than an unsorted (n^2) search = but this still seems slow. (Was also considering along the lines of hashing but not sure about the throughput)

Please help! Looking for an efficient solution that addresses both speed and memory since there are millions of strings involved without using Collections API!

© Stack Overflow or respective owner

Related posts about java

Related posts about interview-questions