How can I strip non-XHTML tags from a string in C#?

Posted by No Average Geek on Stack Overflow See other posts from Stack Overflow or by No Average Geek
Published on 2010-06-06T13:58:51Z Indexed on 2010/06/06 14:02 UTC
Read the original article Hit count: 142

Filed under:
|
|

I need to be able to remove non-XHTML tags from a string containing XHTML that has been stored in a database. The string also contains references for controls (e.g. ) inside the XHTML, but I need clean XHTML with all standard tag contents unchanged.

These control tags are varied (they could be any ASP.NET control), so there are too many to go looking for each one and remove them. The way they are closed is also varied, so not all of them have closing tags, some are self closing.

How can I go about doing this? I've found some HTML cleaners on-line for including in my project, but they either remove everything or just HTML encode the entire string.

Also, I'm dealing with parts of XHTML documents, not entire documents - don't know if that makes a difference.

Any help would be appreciated.

© Stack Overflow or respective owner

Related posts about c#

Related posts about ASP.NET