OWB 11gR2 - Early Arriving Facts
Posted
by Dawei Sun
on Oracle Blogs
See other posts from Oracle Blogs
or by Dawei Sun
Published on Thu, 17 Jun 2010 22:03:29 -0800
Indexed on
2010/06/18
6:14 UTC
Read the original article
Hit count: 631
A common challenge when building ETL components for a data warehouse is how to handle early arriving facts. OWB 11gR2 introduced a new feature to address this for dimensional objects entitled Orphan Management. An orphan record is one that does not have a corresponding existing parent record. Orphan management automates the process of handling source rows that do not meet the requirements necessary to form a valid dimension or cube record.
In this article, a simple example will be provided to show you how to use Orphan Management in OWB. We first import a sample MDL file that contains all the objects we need. Then we take some time to examine all the objects. After that, we prepare the source data, deploy the target table and dimension/cube loading map. Finally, we run the loading maps, and check the data in target dimension/cube tables. OK, let’s start…
1. Import MDL file and examine sample project
First, download zip file from here, which includes a MDL file and three source data files. Then we open OWB design center, import orphan_management.mdl by using the menu File->Import->Warehouse Builder Metadata. Now we have several objects in BI_DEMO project as below:
Mapping LOAD_CHANNELS_OM: The mapping for dimension loading.
Mapping LOAD_SALES_OM: The mapping for cube loading.
Dimension CHANNELS_OM: The dimension that contains channels data.
Cube SALES_OM: The cube that contains sales data.
Table CHANNELS_OM: The star implementation table of dimension CHANNELS_OM.
Table SALES_OM: The star implementation table of cube SALES_OM.
Table SRC_CHANNELS: The source table of channels data, that will be loaded into dimension CHANNELS_OM.
Table SRC_ORDERS and SRC_ORDER_ITEMS: The source tables of sales data that will be loaded into cube SALES_OM.
Sequence CLASS_OM_DIM_SEQ: The sequence used for loading dimension CHANNELS_OM.
Dimension CHANNELS_OM
This dimension has a hierarchy with three levels: TOTAL, CLASS and CHANNEL. Each level has three attributes: ID (surrogate key), NAME and SOURCE_ID (business key). It has a standard star implementation. The orphan management policy and the default parent setting are shown in the following screenshots:
The orphan management policy options that you can set for loading are:
- Reject Orphan: The record is not inserted.
- Default Parent: You can specify a default parent record. This default record is used as the parent record for any record that does not have an existing parent record. If the default parent record does not exist, Warehouse Builder creates the default parent record.
You specify the attribute values of the default parent record at the time of defining the dimensional object. If any ancestor of the default parent does not exist, Warehouse Builder also creates this record. - No Maintenance: This is the default behavior. Warehouse Builder does not actively detect, reject, or fix orphan records.
While removing data from a dimension, you can select one of the following orphan management policies:
- Reject Removal: Warehouse Builder does not allow you to delete the record if it has existing child records.
- No Maintenance: This is the default behavior. Warehouse Builder does not actively detect, reject, or fix orphan records.
(More details are at http://download.oracle.com/docs/cd/E11882_01/owb.112/e10935/dim_objects.htm#insertedID1)
Cube SALES_OM
This cube is references to dimension CHANNELS_OM. It has three measures: AMOUNT, QUANTITY and COST. The orphan management policy setting are shown as following screenshot:
The orphan management policy options that you can set for loading are:
- No Maintenance: Warehouse Builder does not actively detect, reject, or fix orphan rows.
- Default Dimension Record: Warehouse Builder assigns a default dimension record for any row that has an invalid or null dimension key value. Use the Settings button to define the default parent row.
- Reject Orphan: Warehouse Builder does not insert the row if it does not have an existing dimension record.
(More details are at http://download.oracle.com/docs/cd/E11882_01/owb.112/e10935/dim_objects.htm#BABEACDG)
Mapping LOAD_CHANNELS_OM
This mapping loads source data from table SRC_CHANNELS to dimension CHANNELS_OM.
The operator CHANNELS_IN is bound to table SRC_CHANNELS; CHANNELS_OUT is bound to dimension CHANNELS_OM. The TOTALS operator is used for generating a constant value for the top level in the dimension. The CLASS_FILTER operator is used to filter out the “invalid” class name, so then we can see what will happen when those channel records with an “invalid” parent are loading into dimension.
Some properties of the dimension operator in this mapping are important to orphan management. See the screenshot below:
Create Default Level Records: If YES, then default level records will be created. This property must be set to YES for dimensions and cubes if one of their orphan management policies is “Default Parent” or “Default Dimension Record”. This property is set to NO by default, so the user may need to set this to YES manually.
LOAD policy for INVALID keys/ LOAD policy for NULL keys: These two properties have the same meaning as in the dimension editor. The values are set to the same as the dimension value when user drops the dimension into the mapping. The user does not need to modify these properties.
Record Error Rows: If YES, error rows will be inserted into error table when loading the dimension.
REMOVE Orphan Policy: This property is used when removing data from a dimension. Since the dimension loading type is set to LOAD in this example, this property is disabled.
Mapping LOAD_SALES_OM
This mapping loads source data from table SRC_ORDERS and SRC_ORDER_ITEMS to cube SALES_OM.
This mapping seems a little bit complicated, but operators in the red rectangle are used to filter out and generate the records with “invalid” or “null” dimension keys.
Some properties of the cube operator in a mapping are important to orphan management. See the screenshot below:
Enable Source Aggregation: Should be checked in this example. If the default dimension record orphan policy is set for the cube operator, then it is recommended that source aggregation also be enabled. Otherwise, the orphan management processing may produce multiple fact rows with the same default dimension references, which will cause an “unstable rowset” execution error in the database, since the dimension refs are used as update match attributes for updating the fact table.
LOAD policy for INVALID keys/ LOAD policy for NULL keys: These two properties have the same meaning as in the cube editor. The values are set to the same as in the cube editor when the user drops the cube into the mapping. The user does not need to modify these properties.
Record Error Rows: If YES, error rows will be inserted into error table when loading the cube.
2. Deploy objects and mappings
We now can deploy the objects. First, make sure location SALES_WH_LOCAL has been correctly configured. Then open Control Center Manager by using the menu Tools->Control Center Manager. Expand BI_DEMO->SALES_WH_LOCAL, click SALES_WH node on the project tree. We can see the following objects:
Deploy all the objects in the following order:
Sequence CLASS_OM_DIM_SEQ
Table CHANNELS_OM, SALES_OM, SRC_CHANNELS, SRC_ORDERS, SRC_ORDER_ITEMS
Dimension CHANNELS_OM
Cube SALES_OM
Mapping LOAD_CHANNELS_OM, LOAD_SALES_OM
Note that we deployed source tables as well. Normally, we import source table from database instead of deploying them to target schema. However, in this example, we designed the source tables in OWB and deployed them to database for the purpose of this demonstration.
3. Prepare and examine source data
Before running the mappings, we need to populate and examine the source data first. Run SRC_CHANNELS.sql, SRC_ORDERS.sql and SRC_ORDER_ITEMS.sql as target user. Then we check the data in these three tables.
Table SRC_CHANNELS
SQL> select rownum, id, class, name from src_channels;
Records 1~5 are correct; they should be loaded into dimension without error.
Records 6,7 and 8 have null parents; they should be loaded into dimension with a default parent value, and should be inserted into error table at the same time.
Records 9, 10 and 11 have “invalid” parents; they should be rejected by dimension, and inserted into error table.
Table SRC_ORDERS and SRC_ORDER_ITEMS
SQL> select rownum, a.id, a.channel, b.amount, b.quantity, b.cost from src_orders a, src_order_items b where a.id = b.order_id;
Record 178 has null dimension reference; it should be loaded into cube with a default dimension reference, and should be inserted into error table at the same time.
Record 179 has “invalid” dimension reference; it should be rejected by cube, and inserted into error table.
Other records should be aggregated and loaded into cube correctly.
4. Run the mappings and examine the target data
In the Control Center Manager, expand BI_DEMO-> SALES_WH_LOCAL-> SALES_WH-> Mappings, right click on LOAD_CHANNELS_OM node, click Start. Use the same way to run mapping LOAD_SALES_OM.
When they successfully finished, we can check the data in target tables.
Table CHANNELS_OM
SQL> select rownum, total_id, total_name, total_source_id, class_id,class_name, class_source_id, channel_id, channel_name,channel_source_id from channels_om order by abs(dimension_key);
Records 1,2 and 3 are the default dimension records for the three levels.
Records 8, 10 and 15 are the loaded records that originally have null parents. We see their parents name (class_name) is set to DEF_CLASS_NAME.
Those records whose CHANNEL_NAME are Special_4, Special_5 and Special_6 are not loaded to this table because of the invalid parent.
Error Table CHANNELS_OM_ERR
SQL> select rownum, class_source_id, channel_id, channel_name,channel_source_id, err$$$_error_reason from channels_om_err order by channel_name;
We can see all the record with null parent or invalid parent are inserted into this error table. Error reason is “Default parent used for record” for the first three records, and “No parent found for record” for the last three.
Table SALES_OM
SQL> select a.*, b.channel_name from sales_om a, channels_om b where a.channels=b.channel_id;
We can see the order record with null channel_name has been loaded into target table with a default channel_name. The one with “invalid” channel_name are not loaded.
Error Table SALES_OM_ERR
SQL> select a.amount, a.cost, a.quantity, a.channels, b.channel_name, a.err$$$_error_reason from sales_om_err a, channels_om b where a.channels=b.channel_id(+);
We can see the order records with null or invalid channel_name are inserted into error table. If the dimension reference column is null, the error reason is “Default dimension record used for fact”. If it is invalid, the error reason is “Dimension record not found for fact”.
Summary
In summary, this article illustrated the Orphan Management feature in OWB 11gR2. Automated orphan management policies improve ETL developer and administrator productivity by addressing an important cause of cube and dimension load failures, without requiring developers to explicitly build logic to handle these orphan rows.
© Oracle Blogs or respective owner