regex: trim all strings directly preceeded by digit except if string belongs to predefined set of st
- by Geert-Jan
I've got addresses I need to clean up for matching purposes.
Part of the process is trimming unwanted suffices from housenumbers, e.g:
mainstreet 4a --> mainstreet 4.
However I don't want:
618 5th Ave SW --> 618 5 Ave SW
in other words there are some strings (for now: st, nd, rd, th) which I don't want to strip.
What would be the best method of doing this (regex or otherwise) ?
a wokring regex without the exceptions would be:
a = a.replaceAll("(^| )([0-9]+)[a-z]+($| )","$1$2$3"); //replace 1a --> 1
I thought about first searching and substiting the special cases with special characters while keeping the references in a map, then do the above regex, and then doing the reverse substitute using the reference map, but I'm looking for a simpler solution.
Thanks