Loading Dimension Tables - Methodologies
- by Nev_Rahd
Hello,
Recently I been working on project, where need to populated Dim Tables from EDW Tables.
EDW Tables are of type II which does maintain historical data. When comes to load Dim Table, for which source may be multiple EDW Tables or would be single table with multi level pivoting (on attributes).
Mean: There would be 10 records - one for each attribute which need to be pivoted on domain_code to make a single row in Dim. Out of these 10 records there would be some attributes with same domain_code but with different sub_domain_code, which needs further pivoting on subdomain code.
Ex:
if i got domain code: 01,02, 03 = which are straight pivot on domain code
I would also have domain code: 10 with subdomain code / version as 2006,2007,2008,2009
That means I need to split my source table with above attributes into two = one for domain code and other for domain_code + version.
so far so good.
When it comes to load Dim Table:
As per design specs for Dimensions (originally written by third party), what they want is:
for every single change in EDW (attribute), it should assemble all the related records (for that NK) mean new one with other attribute values which are current = process them to create a new dim record and insert it.
That mean if a single extract contains 100 records updated (one for each NK), it should assemble 100 + (100*9) records to insert / update dim table. How good is this approach.
Other way I tried to do is just do a lookup into dim table for that NK get the value's of recent records (attributes which not changed) and insert it and update the current one.
What would be the better approach assembling records at source side for one attribute change or looking into dim table's recent record and process it.
If this doesn't make sense, would like to elaborate it further.
Thanks