Compact data structure for storing a large set of integral values

Posted by Odrade on Stack Overflow See other posts from Stack Overflow or by Odrade
Published on 2011-03-08T23:23:12Z Indexed on 2011/03/09 0:10 UTC
Read the original article Hit count: 215

Filed under:
|
|

I'm working on an application that needs to pass around large sets of Int32 values. The sets are expected to contain ~1,000,000-50,000,000 items, where each item is a database key in the range 0-50,000,000. I expect distribution of ids in any given set to be effectively random over this range. The operations I need on the set are dirt simple:

  • Add a new value
  • Iterate over all of the values.

There is a serious concern about the memory usage of these sets, so I'm looking for a data structure that can store the ids more efficiently than a simple List<int>or HashSet<int>. I've looked at BitArray, but that can be wasteful depending on how sparse the ids are. I've also considered a bitwise trie, but I'm unsure how to calculate the space efficiency of that solution for the expected data. A Bloom Filter would be great, if only I could tolerate the false negatives.

I would appreciate any suggestions of data structures suitable for this purpose. I'm interested in both out-of-the-box and custom solutions.

EDIT: To answer your questions:

  • No, the items don't need to be sorted
  • By "pass around" I mean both pass between methods and serialize and send over the wire. I clearly should have mentioned this.
  • There could be a decent number of these sets in memory at once (~100).

© Stack Overflow or respective owner

Related posts about c#

Related posts about algorithm