What is a data structure for quickly finding non-empty intersections of a list of sets?
Posted
by Andrey Fedorov
on Stack Overflow
See other posts from Stack Overflow
or by Andrey Fedorov
Published on 2010-04-06T22:36:25Z
Indexed on
2010/04/06
23:13 UTC
Read the original article
Hit count: 321
I have a set of N
items, which are sets of integers, let's assume it's ordered and call it I[1..N]
. Given a candidate
set, I need to find the subset of I
which have non-empty intersections with the candidate
.
So, for example, if:
I = [{1,2}, {2,3}, {4,5}]
I'm looking to define valid_items(items, candidate)
, such that:
valid_items(I, {1}) == {1}
valid_items(I, {2}) == {1, 2}
valid_items(I, {3,4}) == {2, 3}
I'm trying to optimize for one given set I
and a variable candidate
sets. Currently I am doing this by caching items_containing[n] = {the sets which contain n}
. In the above example, that would be:
items_containing = [{}, {1}, {1,2}, {2}, {3}, {3}]
That is, 0 is contained in no items, 1 is contained in item 1, 2 is contained in itmes 1 and 2, 2 is contained in item 2, 3 is contained in item 2, and 4 and 5 are contained in item 3.
That way, I can define valid_items(I, candidate) = union(items_containing[n] for n in candidate)
.
Is there any more efficient data structure (of a reasonable size) for caching the result of this union? The obvious example of space 2^N
is not acceptable, but N
or N*log(N)
would be.
© Stack Overflow or respective owner