Compact data structure for storing a large set of integral values
Posted
by
Odrade
on Stack Overflow
See other posts from Stack Overflow
or by Odrade
Published on 2011-03-08T23:23:12Z
Indexed on
2011/03/09
0:10 UTC
Read the original article
Hit count: 215
I'm working on an application that needs to pass around large sets of Int32
values. The sets are expected to contain ~1,000,000-50,000,000
items, where each item is a database key in the range 0-50,000,000
. I expect distribution of ids in any given set to be effectively random over this range. The operations I need on the set are dirt simple:
- Add a new value
- Iterate over all of the values.
There is a serious concern about the memory usage of these sets, so I'm looking for a data structure that can store the ids more efficiently than a simple List<int>
or HashSet<int>
. I've looked at BitArray
, but that can be wasteful depending on how sparse the ids are. I've also considered a bitwise trie
, but I'm unsure how to calculate the space efficiency of that solution for the expected data. A Bloom Filter would be great, if only I could tolerate the false negatives.
I would appreciate any suggestions of data structures suitable for this purpose. I'm interested in both out-of-the-box and custom solutions.
EDIT: To answer your questions:
- No, the items don't need to be sorted
- By "pass around" I mean both pass between methods and serialize and send over the wire. I clearly should have mentioned this.
- There could be a decent number of these sets in memory at once (~100).
© Stack Overflow or respective owner