How can I strip non-XHTML tags from a string in C#?
- by No Average Geek
I need to be able to remove non-XHTML tags from a string containing XHTML that has been stored in a database. The string also contains references for controls (e.g. ) inside the XHTML, but I need clean XHTML with all standard tag contents unchanged.
These control tags are varied (they could be any ASP.NET control), so there are too many to go looking for each one and remove them. The way they are closed is also varied, so not all of them have closing tags, some are self closing.
How can I go about doing this? I've found some HTML cleaners on-line for including in my project, but they either remove everything or just HTML encode the entire string.
Also, I'm dealing with parts of XHTML documents, not entire documents - don't know if that makes a difference.
Any help would be appreciated.