Fastest way to perform subset test operation on a large collection of sets with same domain

Posted by niktech on Stack Overflow See other posts from Stack Overflow or by niktech
Published on 2010-12-28T00:44:35Z Indexed on 2010/12/28 0:54 UTC
Read the original article Hit count: 249

Filed under:

database

|

algorithm

|

database-design

|

sets

|

set-theory

Assume we have trillions of sets stored somewhere. The domain for each of these sets is the same. It is also finite and discrete. So each set may be stored as a bit field (eg: 0000100111...) of a relatively short length (eg: 1024). That is, bit X in the bitfield indicates whether item X (of 1024 possible items) is included in the given set or not.

Now, I want to devise a storage structure and an algorithm to efficiently answer the query: what sets in the data store have set Y as a subset. Set Y itself is not present in the data store and is specified at run time.

Now the simplest way to solve this would be to AND the bitfield for set Y with bit fields of every set in the data store one by one, picking the ones whose AND result matches Y's bitfield.

How can I speed this up? Is there a tree structure (index) or some smart algorithm that would allow me to perform this query without having to AND every stored set's bitfield?

Are there databases that already support such operations on large collections of sets?

© Stack Overflow or respective owner

Related posts about database

SQL SERVER Retrieve and Explore Database Backup without Restoring Database Idera virtual database

as seen on Dot net Slackers - Search for 'Dot net Slackers'
I recently downloaded Ideras SQL virtual database, and tested it. There are a few things about this tool which caught my attention.My ScenarioIt is quite common in real life that sometimes observing or retrieving older data is necessary; however, it had changed as time passed by. The full database… >>> More
Cloning A Database On The Same Server Using Rman Duplicate From Active Database

as seen on Oracle Blogs - Search for 'Oracle Blogs'
To clone a database using Rman we used to require an existing Rman backup, on 11g we can clone databases using the "from active" database option. In this case we do not require an existing backup, the active datafiles will be used as the source for the clone. In order to clone with the source database… >>> More
cPickle ImportError: No module named multiarray

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, I'm using cPickle to save my Database into file. The code looks like that: def Save_DataBase(): import cPickle from scipy import * from numpy import * a=Results.VersionName #filename='D:/results/'+a[a.find('/')+1:-a.find('/')-2]+Results.AssType[:3]+str(random.randint(0,100))+Results.Distribution+"… >>> More
SQL SERVER – 2008 – Introduction to Snapshot Database – Restore From Snapshot

as seen on SQL Authority - Search for 'SQL Authority'
Snapshot database is one of the most interesting concepts that I have used at some places recently. Here is a quick definition of the subject from Book On Line: A Database Snapshot is a read-only, static view of a database (the source database). Multiple snapshots can exist on a source database and… >>> More
OTN ???? ?????? ???????

as seen on Oracle Blogs - Search for 'Oracle Blogs'
Database ?? Database ??????? Database ?????????? Java WebLogic Server/????????·???? SOA/BPM/????? ???????/???? ID??/?????? ?????EPM/BI EPM/BI ??????? EPM/BI ???? OS/??? ???? ????? MySQL Database ?? ???? ?? ????????? ??? ?? ORACLE MASTER… >>> More

Related posts about algorithm

Jpeg Algorithm vs BMP Algorithm?

as seen on Super User - Search for 'Super User'
I'm just wonder, what the differences are between creating a BMP file algorithm and JPG file algorithm ? If you know the others images' format algorithm, please post them. Thanks. >>> More
word disambiguation algorithm (Lesk algorithm)

as seen on Stack Overflow - Search for 'Stack Overflow'
Hii.. Can anybody help me to find an algorithm in Java code to find synonyms of a search word based on the context and I want to implement the algorithm with WordNet database. For example, "I am running a Java program". From the context, I want to find the synonyms for the word "running", but the… >>> More
Search algorithm (with a sort algorithm already implemented)

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, Im doing a Java application and Im facing some doubts in which concerns performance. I have a PriorityQueue which guarantees me the element removed is the one with greater priority. That PriorityQueue has instances of class Event (which implements Comparable interface). Each Event is associated… >>> More
Is there any algorithm for finding LINES by PIXEL COLORS on picture?

as seen on Stack Overflow - Search for 'Stack Overflow'
So I have Image like this I want to get something like this (I hevent drawn all lines I want but I hope you can get my idea) I need algorithm for finding all straight lines on it by just reading colors of pixels. No hard math, no Haar, no Hough. Some algorithm which would be based on points… >>> More
collsion issues with quadtree [on hold]

as seen on Game Development - Search for 'Game Development'
So i implemented a Quad tree in Java for my 2D game and everything works fine except for when i run my collision detection algorithm, which checks if a object has hit another object and which side it hit.My problem is 80% of the time the collision algorithm works but sometimes the objects just go… >>> More