RegexClean Transformation
Posted
on SQLIS
See other posts from SQLIS
Published on Sat, 11 Oct 2008 05:27:00 +0100
Indexed on
2010/05/26
7:12 UTC
Read the original article
Hit count: 867
Component Downloads
|Transformations
Use the power of regular expressions to cleanse your data right there inside the Data Flow. This transformation includes a full user interface for simple configuration, as well as advanced features such as error output configuration.
Two regular expressions are used, a match expression and a replace expression. The transformation is designed around the named capture groups or match groups, and even supports multiple expressions. This allows for rich and complex expressions to be built, all through an easy to reuse transformation where a bespoke Script Component was previously the only alternative.
Some simple properties are available for each column selected –
Behaviour
The two behaviour modes offer similar functionality but with a difference. Replace, replaces tokens with the input, and Emit overwrites the whole string.
Cascade
Cascade allows you to define multiple expressions, each on a new line. The match expression will be processed into one operation per line, which are then processed in order at run-time. Multiple replace expressions can also be specified, again each on a new line. If there is no corresponding replace expression for a match expression line, then the last replace expression will be used instead. It is common to have multiple match expressions, but only a single replace expression.
Match Expression
The expression used to define the named capture groups. This is where you can analyse the data, and tag or name elements within it as found by the match expression.
Replace Expression
The replace determines the final output. It will reference the named groups from the match expression and assembles them into the final output.
If you want to use regular expressions to validate data then try the Regular Expression Transformation.
Quick Start Guide
Quick Sample #1
Parse an email address and extract the user and domain portions. Format as a web address passing the user portion as a URL parameter. This uses two match groups, user and host, which correspond to the text before the @ and after it respectively.
Behaviour is Emit, and cascade of false, we only have a single match expression.
Match Expression ^(?<user>[^@]+)@(?<host>.+)$
Replace Expression - http://www.${host}?user=${user}
Results
Sample Input | Sample Output |
[email protected] | http://www.adventure-works.com?user=zheng0 |
The component is provided as an MSI file, however to complete the installation, you will have to add the transformation to the Visual Studio toolbox manually. Right-click the toolbox, and select Choose Items.... Select the SSIS Data Flow Items tab, and then check the RegexClean Transformation from the list.
Downloads
The RegexClean Transformation is available for both SQL Server 2005 and SQL Server 2008. Please choose the version to match your SQL Server version, or you can install both versions and use them side by side if you have both SQL Server 2005 and SQL Server 2008 installed.
RegexClean Transformation for SQL Server 2005
RegexClean Transformation for SQL Server 2008
Version History
SQL Server 2005
Version 1.0.0.105 - Public Release
(28 Jan 2008)
SQL Server 2005
Version 1.0.0.105 - Public Release
(28 Jan 2008)
Screenshot
© SQLIS or respective owner