Using Regex, how can I remove certain characters from the inside of tags in a string of html?
Posted
by Iain Fraser
on Stack Overflow
See other posts from Stack Overflow
or by Iain Fraser
Published on 2010-05-12T08:01:46Z
Indexed on
2010/05/12
8:04 UTC
Read the original article
Hit count: 398
Suppose I have a string of html that contains a bunch of control characters and I want to remove the control characters from inside tags only, leaving the characters outside the tags alone.
For example
Here the control character is the numeral "1".
Input
The quick 1<strong>orange</strong> lemming <sp11a1n 1class1='jumpe111r'11>jumps over</span> 1the idle 1frog
Desired Output
The quick 1<strong>orange</strong> lemming <span class='jumper'>jumps over</span> 1the idle 1frog
So far I can match tags which contain the control character but I can't remove them in one regex. I guess I could perform another regex on my matches, but I'd really like to know if there's a better way.
My regex
Bear in mind this one only matches tags which contain the control character.
<(([^>])*?`([^>])*?)*?>
Thanks very much for your time and consideration.
Iain Fraser
© Stack Overflow or respective owner