need to clean malformed tags using regular expression

Posted by Brian on Stack Overflow See other posts from Stack Overflow or by Brian
Published on 2010-06-13T22:31:20Z Indexed on 2010/06/13 22:42 UTC
Read the original article Hit count: 195

Filed under:
|

Looking to find the appropriate regular expression for the following conditions:

I need to clean certain tags within free flowing text. For example, within the text I have two important tags: <2004:04:12> and . Unfortunately some of tags have missing "<" or ">" delimiter.

For example, some are as follows:

1) <2004:04:12 , I need this to be <2004:04:12>
2) 2004:04:12>, I need this to be <2004:04:12>
3) <John Doe , I need this to be <John Doe>

I attempted to use the following for situation 1:

String regex = "<\\d{4}-\\d{2}-\\d{2}\\w*{2}[^>]";
String output = content.replaceAll(regex,"$0>");

This did find all instances of "<2004:04:12" and the result was "<2004:04:12 >". However, I need to eliminate the space prior to the ending tag.

Not sure this is the best way. Any suggestions.

Thanks

© Stack Overflow or respective owner

Related posts about java

Related posts about regex