regex: trim all strings directly preceeded by digit except if string belongs to predefined set of st

Posted by Geert-Jan on Stack Overflow See other posts from Stack Overflow or by Geert-Jan
Published on 2010-05-07T09:48:25Z Indexed on 2010/05/07 9:58 UTC
Read the original article Hit count: 233

Filed under:
|

I've got addresses I need to clean up for matching purposes. Part of the process is trimming unwanted suffices from housenumbers, e.g:

mainstreet 4a --> mainstreet 4. 

However I don't want:

618 5th Ave SW  --> 618 5 Ave SW 

in other words there are some strings (for now: st, nd, rd, th) which I don't want to strip. What would be the best method of doing this (regex or otherwise) ?

a wokring regex without the exceptions would be:

a = a.replaceAll("(^| )([0-9]+)[a-z]+($| )","$1$2$3"); //replace 1a --> 1

I thought about first searching and substiting the special cases with special characters while keeping the references in a map, then do the above regex, and then doing the reverse substitute using the reference map, but I'm looking for a simpler solution.

Thanks

© Stack Overflow or respective owner

Related posts about regex

Related posts about java