How can I remove certain characters from inside angle-brackets, leaving the characters outside alone
Posted
by Iain Fraser
on Stack Overflow
See other posts from Stack Overflow
or by Iain Fraser
Published on 2010-05-12T08:01:46Z
Indexed on
2010/05/13
0:34 UTC
Read the original article
Hit count: 340
Edit: To be clear, please understand that I am not using Regex to parse the html, that's crazy talk! I'm simply wanting to clean up a messy string of html so it will parse
Edit #2: I should also point out that the control character I'm using is a special unicode character - it's not something that would ever be used in a proper tag under any normal circumstances
Suppose I have a string of html that contains a bunch of control characters and I want to remove the control characters from inside tags only, leaving the characters outside the tags alone.
For example
Here the control character is the numeral "1".
Input
The quick 1<strong>orange</strong> lemming <sp11a1n 1class1='jumpe111r'11>jumps over</span> 1the idle 1frog
Desired Output
The quick 1<strong>orange</strong> lemming <span class='jumper'>jumps over</span> 1the idle 1frog
So far I can match tags which contain the control character but I can't remove them in one regex. I guess I could perform another regex on my matches, but I'd really like to know if there's a better way.
My regex
Bear in mind this one only matches tags which contain the control character.
<(([^>])*?`([^>])*?)*?>
Thanks very much for your time and consideration.
Iain Fraser
© Stack Overflow or respective owner