What's wrong with my regex

Posted by Tom Brown on Stack Overflow See other posts from Stack Overflow or by Tom Brown
Published on 2010-03-17T09:48:05Z Indexed on 2010/03/17 10:21 UTC
Read the original article Hit count: 461

Filed under:
|

Yes I know its usually a bad idea to parse HTML using RegEx, but that aside can someone explain the fault here:

 string outputString = Regex.Replace(inputString, @"<?(?i:script|embed|object|frameset|frame|iframe|metalink|style|html|img|layer|ilayer|meta|applet)(.|\n)*?>", "");
if (outputString != inputString)
{
   Console.WriteLine("unwanted tags detected");
}

It certainly detects the intended tags like: <script> and <html>, but it also rejects strings I want to allow such as <B>Description</B> and <A href="http://www.mylink.com/index.html">A Link containing 'HTML'</A>

© Stack Overflow or respective owner

Related posts about c#

Related posts about regex