regex: trim all strings directly preceeded by digit except if string belongs to predefined set of st
Posted
by Geert-Jan
on Stack Overflow
See other posts from Stack Overflow
or by Geert-Jan
Published on 2010-05-07T09:48:25Z
Indexed on
2010/05/07
9:58 UTC
Read the original article
Hit count: 233
I've got addresses I need to clean up for matching purposes. Part of the process is trimming unwanted suffices from housenumbers, e.g:
mainstreet 4a --> mainstreet 4.
However I don't want:
618 5th Ave SW --> 618 5 Ave SW
in other words there are some strings (for now: st, nd, rd, th) which I don't want to strip. What would be the best method of doing this (regex or otherwise) ?
a wokring regex without the exceptions would be:
a = a.replaceAll("(^| )([0-9]+)[a-z]+($| )","$1$2$3"); //replace 1a --> 1
I thought about first searching and substiting the special cases with special characters while keeping the references in a map, then do the above regex, and then doing the reverse substitute using the reference map, but I'm looking for a simpler solution.
Thanks
© Stack Overflow or respective owner