need to clean malformed tags using regular expression
Posted
by Brian
on Stack Overflow
See other posts from Stack Overflow
or by Brian
Published on 2010-06-13T22:31:20Z
Indexed on
2010/06/13
22:42 UTC
Read the original article
Hit count: 195
Looking to find the appropriate regular expression for the following conditions:
I need to clean certain tags within free flowing text. For example, within the text I have two important tags: <2004:04:12> and . Unfortunately some of tags have missing "<" or ">" delimiter.
For example, some are as follows:
1) <2004:04:12 , I need this to be <2004:04:12>
2) 2004:04:12>, I need this to be <2004:04:12>
3) <John Doe , I need this to be <John Doe>
I attempted to use the following for situation 1:
String regex = "<\\d{4}-\\d{2}-\\d{2}\\w*{2}[^>]";
String output = content.replaceAll(regex,"$0>");
This did find all instances of "<2004:04:12" and the result was "<2004:04:12 >". However, I need to eliminate the space prior to the ending tag.
Not sure this is the best way. Any suggestions.
Thanks
© Stack Overflow or respective owner