need to clean malformed tags using regular expression
- by Brian
Looking to find the appropriate regular expression for the following conditions:
I need to clean certain tags within free flowing text. For example, within the text I have two important tags: <2004:04:12 and . Unfortunately some of tags have missing "<" or "" delimiter.
For example, some are as follows:
1) <2004:04:12 , I need this to be <2004:04:12>
2) 2004:04:12>, I need this to be <2004:04:12>
3) <John Doe , I need this to be <John Doe>
I attempted to use the following for situation 1:
String regex = "<\\d{4}-\\d{2}-\\d{2}\\w*{2}[^>]";
String output = content.replaceAll(regex,"$0>");
This did find all instances of "<2004:04:12" and the result was "<2004:04:12 ".
However, I need to eliminate the space prior to the ending tag.
Not sure this is the best way. Any suggestions.
Thanks