RegEx strip html tags problem
Posted
by Aleksandar Mirilovic
on Stack Overflow
See other posts from Stack Overflow
or by Aleksandar Mirilovic
Published on 2010-04-28T11:37:20Z
Indexed on
2010/04/28
11:43 UTC
Read the original article
Hit count: 476
Hi,
I've tried to strip html tags using regex replace with pattern "<[^>]*>" from word generated html that looks like this:
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:st1="urn:schemas-microsoft-com:office:smarttags" xmlns="http://www.w3.org/TR/REC-html40">
<head> <meta http-equiv=Content-Type content="text/html; charset=iso-8859-2"> <meta name=Generator content="Microsoft Word 11 (filtered medium)"> <!--[if !mso]> <style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style> <![endif]--><o:SmartTagType namespaceuri="urn:schemas-microsoft-com:office:smarttags" name="place" downloadurl="http://www.5iantlavalamp.com/"/> <!--[if !mso]> <style>
st1\:*{behavior:url(#default#ieooui) }
</style> <![endif]--> <style> <!-- /* Font Definitions / @font-face {font-family:Tahoma; panose-1:2 11 6 4 3 5 4 4 2 4;} / Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:12.0pt; font-family:"Times New Roman";} a:link, span.MsoHyperlink {color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {color:purple; text-decoration:underline;} span.EmailStyle17 {mso-style-type:personal; font-family:Arial; color:windowtext;} span.EmailStyle18 {mso-style-type:personal-reply; font-family:Arial; color:navy;} @page Section1 {size:8.5in 11.0in; margin:1.0in 1.25in 1.0in 1.25in;} div.Section1 {page:Section1;} --> </style>
</head>
Everything works fine except for the bolded lines above, anybody got ideas how to match the them to?
Thanks,
Aleksandar
© Stack Overflow or respective owner