Field specific errors for ETL

Posted by AaronLS on Stack Overflow See other posts from Stack Overflow or by AaronLS
Published on 2010-02-18T20:35:15Z Indexed on 2010/06/02 0:33 UTC
Read the original article Hit count: 385

Filed under:

I am creating a ETL process in MS SQL Server and I would like to have errors specific to a particular column of a particular row. For example, the data is initially loaded from excel files into a table(we'll call the Initial table) where all columns are varchar(2000) and then I stage the data to another table(the DataTypedTable) that contains more specific data types (datetime,int, etc.) or more tightly constrained varchar lengths. I need to be able to create error messages for a specific field such as: "Jan. 13th" is not a valid date format for the submission date. Please use a format of MM/DD/YYYY

These error messages would need to be stored in some way such that later in the process a automated process can create reports with the error messages such that each message references a specific row and field(someone will need to go back and correct the data in the source system and resubmit the excel file). So ideally it would be inserted into a Failures tables of some sort and contain the primary key of the failed row, the column name, and the error message.

Question: So I am wondering if this can be accomplished with SSIS, or some open source tool like Talend, and if so, what would be your general approach? Or what hand coded approach you would take?

Couple approaches I've thought of using SQL(up until no I have done ETL by hand in SQL procs, but I want to consider other approaches. Possible C# even.):

Use a cursor to read through the Initial table, and for each row insert a blank record with only the primary key into the DataTyped table, then use a single update statement for each column, such that if that update fails I can insert a very specific error message specific to that column in the error messages table.

Insert all the data as is into the DataTyped table, but have duplicate columns like SubmissionDate and SubmissionDateOld. After the initial insert the *Old columns have data, the rest are blank, and I have a single update for each column that sets the SubmissionDate based on the SubmissionDateOld.

In addition to suggesting an approach, I'd like to know if you are using that approach or something similar already in the work you do.

Developer IT

Field specific errors for ETL - Developer IT

Field specific errors for ETL

sql

ssis

etl

talend

Related posts about sql

SQL SERVER – Concat Strings in SQL Server using T-SQL – SQL in Sixty Seconds #035 – Video

SQL SERVER – Concat Function in SQL Server – SQL Concatenation

Error with SQL Server Setup 2012 on Windows 2012

How can I detect which version of SQL (eg SQL 2008 or SQL Azure)

Nested SQL Select statement fails on SQL Server 2000, ok on SQL Server 2005

Related posts about ssis

SSIS packages incompatibilities between SSIS 2008 and SSIS 2008 R2

Parsing flat files using SSIS : SSIS Nugget

Introducing SSIS Reporting Pack for SQL Server code-named Denali

New SSIS tool on Codeplex – SSIS Log Analyzer

New SSIS tool on Codeplex – SSIS Log Analyzer

Categories cloud