How do I extract a postcode from one column in SSIS using regular expression

Posted by Aphillippe on Stack Overflow See other posts from Stack Overflow or by Aphillippe
Published on 2012-07-05T15:14:22Z Indexed on 2012/07/05 15:15 UTC
Read the original article Hit count: 393

Filed under:
|

I'm trying to use a custom regex clean transformation (information found here ) to extract a post code from a mixed address column (Address3) and move it to a new column (Post Code)

Example of incoming data:

Address3: "London W12 9LZ"

Incoming data could be any combination of place names with a post code at the start, middle or end (or not at all).

Desired outcome:

Address3: "London"
Post Code: "W12 9LZ"

Essentially, in plain english, "move (not copy) any post code found from address3 into Post Code".

My regex skills aren't brilliant but I've managed to get as far as extracting the post code and getting it into its own column using the following regex, matching from Address3 and replacing into Post Code:

Match Expression:

(?<stringOUT>([A-PR-UWYZa-pr-uwyz]([0-9]{1,2}|([A-HK-Ya-hk-y][0-9]|[A-HK-Ya-hk-y][0-9]     ([0-9]|[ABEHMNPRV-Yabehmnprv-y]))|[0-9][A-HJKS-UWa-hjks-uw])\ {0,1}[0-9][ABD-HJLNP-UW-Zabd-hjlnp-uw-z]{2}|([Gg][Ii][Rr]\ 0[Aa][Aa])|([Ss][Aa][Nn]\ {0,1}[Tt][Aa]1)|([Bb][Ff][Pp][Oo]\ {0,1}([Cc]\/[Oo]\ )?[0-9]{1,4})|(([Aa][Ss][Cc][Nn]|[Bb][Bb][Nn][Dd]|[BFSbfs][Ii][Qq][Qq]|[Pp][Cc][Rr][Nn]|[Ss][Tt][Hh][Ll]|[Tt][Dd][Cc][Uu]|[Tt][Kk][Cc][Aa])\ {0,1}1[Zz][Zz])))

Replace Expression:

${stringOUT}

So this leaves me with:

Address3: "London W12 9LZ"
Post Code: "W12 9LZ"

My next thought is to keep the above match/replace, then add another to match anything that doesn't match the above regex. I think it might be a negative lookahead but I can't seem to make it work.

I'm using SSIS 2008 R2 and I think the regex clean transformation uses .net regex implementation.

Thanks.

© Stack Overflow or respective owner

Related posts about regex

Related posts about ssis