Handling primary key duplicates in a data warehouse load

Posted by Meff on Stack Overflow See other posts from Stack Overflow or by Meff
Published on 2010-04-30T15:55:16Z Indexed on 2010/04/30 15:57 UTC
Read the original article Hit count: 195

Filed under:
|
|

I'm currently building an ETL system to load a data warehouse from a transactional system. The grain of my fact table is the transaction level. In order to ensure I don't load duplicate rows I've put a primary key on the fact table, which is the transaction ID.

I've encountered a problem with transactions being reversed - In the transactional database this is done via a status, which I pick up and I can work out if the transaction is being done, or rolled back so I can load a reversal row in the warehouse. However, the reversal row will have the same transaction ID and so I get a primary key violation.

I've solved this for now by negating the primary key, so transaction ID 1 would be a payment, and transaction ID -1 (In the warehouse only) would be the reversal.

I have considered an alternative of generating a BIT column, where 0 is normal and 1 is reversal, then making the PK the transaction ID and the BIT column.

My question is, is this a good practice, and has anyone else encountered anything like this? For reference, this is a payment processing system, so values will not be modified, so there will only ever be transactions and reversals.

© Stack Overflow or respective owner

Related posts about etl

Related posts about ssis