Oracle data warehouse design - fact table acting as a dimension?
- by Elizabeth
THANKS: Both answers here are very helpful, but I could only pick one. I really appreciate the advice!
our datawarehouse will be used more for workflow reports than traditional analytical reports. Our users care about "current picture" far more than history. (though history matters, too.) We are a government entity that does not have costs or related calculations. Mostly just counts of people within given locations and with related history.
We are using Oracle, and I have found distinct advantage in using the star join whenever possible and would like to rearchitect everything to as closely resemble the star schema as is reasonable for our business uses. Speed in this DW is vital, and a number of tests have already proven the star schema approach to me.
Our "person" table is key - it contains over 4 million records and will be the most frequently used source in queries. It can be seen at the center of a star with multiple dimensions (like age, gender, affiliation, location, etc.). It is a very LONG table, particularly when I join it to the address and contact information.
However, it is more like a dimension table when we start looking at history. For example, there are two different history tables that have a person key pointing to the person table. One has over 20 million records and the other has almost 50 million and grows daily.
Is this table a fact table or a dimension table? Can one work as both? If so, is that going to be a big performance problem? Is it common to query more off of a dimension than a fact? What happens if a DIFFERENT fact table that uses the person table as a dimension is actually only 60,000 records (much smaller.).
I think my problem is that our data and use of it does not fit with the commonly use examples of star schemas.
CLARIFICATION: Some good thoughts have been added below, but perhaps I left too much out to really explain well. Here's some more info:
We handle a voter database. We don't have any measures except voter counts by various groups: voter counts by party, by age, by location; voter counts by ballot type and election, by ballot status and election, etc. We do have a "voting history" log as well as an activity audit log (change of address, party, etc.). We have information on which voters are election workers and all that related information. I figure I'll get to the peripheral stuff later.
For now I'm focusing on our two major "business processes": voter registration(which IS a voter.) and election turnout. In the first, voter is a fact. In the second, voter is a dimension, along with party, election, and type of ballot. (and in case anyone is worried - no we don't know HOW people vote. Just that they do. LOL )
I hope that clarifies things a bit.