SSIS - Range lookups
- by Repieter
When developing an ETL solution in SSIS we sometimes need to do range lookups in SSIS. Several solutions for this can be found on the internet, but now we have built another solution which I would like to share, since it's pretty easy to implement and the performance is fast.
You can download the sample package to see how it works. Make sure you have the AdventureWorks2008R2 and AdventureWorksDW2008R2 databases installed. (Apologies for the layout of this blog, I don't do this too often :))
To give a little bit more information about the example, this is basically what is does: we load a facttable and do an SCD type 2 lookup operation of the Product dimension. This is done with a script component.
First we query the Data warehouse to create the lookup dataset. The query that is used for that is:
SELECT
[ProductKey]
,[ProductAlternateKey]
,[StartDate]
,ISNULL([EndDate], '9999-01-01') AS EndDate
FROM [DimProduct]
The output of this query is stored in a DataTable:
string lookupQuery = @"
SELECT
[ProductKey]
,[ProductAlternateKey]
,[StartDate]
,ISNULL([EndDate], '9999-01-01') AS EndDate
FROM [DimProduct]";
OleDbCommand oleDbCommand = new OleDbCommand(lookupQuery, _oleDbConnection);
OleDbDataAdapter adapter = new OleDbDataAdapter(oleDbCommand);
_dataTable = new DataTable();
adapter.Fill(_dataTable);
Now that the dimension data is stored in the DataTable we use the following method to do the actual lookup:
public int RangeLookup(string businessKey, DateTime lookupDate)
{
// set default return value (Unknown)
int result = -1;
DataRow[] filteredRows;
filteredRows = _dataTable.Select(string.Format("ProductAlternateKey = '{0}'", businessKey));
for (int i = 0; i < filteredRows.Length; i++)
{
// check if the lookupdate is found between the startdate and enddate of any of the records
if (lookupDate >= (DateTime)filteredRows[i][2] && lookupDate < (DateTime)filteredRows[i][3])
{
result = (filteredRows[i][0] == null) ? -1 : (int)filteredRows[i][0];
break;
}
}
filteredRows = null;
return result;
}
This method is executed for every row that passes the script component. This is implemented in the ProcessInputRow method
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
// Perform the lookup operation on the current row and put the value in the Surrogate Key Attribute
Row.ProductKey = RangeLookup(Row.ProductNumber, Row.OrderDate);
}
Now what actually happens?!
1. Every record passes the business key and the orderdate to the RangeLookup method.
2. The DataTable is then filtered on the business key of the current record. The output is stored in a DataRow [] object.
3. We loop over the DataRow[] object to see where the orderdate meets the following expression:
(lookupDate >= (DateTime)filteredRows[i][2] && lookupDate < (DateTime)filteredRows[i][3])
4. When the expression returns true (so where the data is between the Startdate and the EndDate), the surrogate key of the dimension record is returned
We have done some testing with this solution and it works great for us. Hope others can use this example to do their range lookups.