replace html tags within xml content with wordML formatting tags
- by Josh
I am taking an XML document and creating a word document using XSLT and OpenXML. The problem is that when I create the word document, all of the HTML that is within the CDATA tags are not escaped and look like this:
GET /recipe/recipe/cat.php/>"><script>alert(document.domain)</script>
I have tried defining "cdata-section-elements" in my xsl:output; however I receive an error stating that p tag doesn't match the w:t tag.(the p tag is apart of the CDATA HTML).
Here is what one of my xsl templates looks like:
<xsl:template match="SECTION">
<w:p w:rsidR="00272D24" w:rsidRPr="00272D24" w:rsidRDefault="00272D24">
<w:pPr>
<w:rPr>
<w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
</w:rPr>
</w:pPr>
</xsl:template>
<w:r w:rsidRPr="00272D24">
<w:rPr>
<w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
</w:rPr>
<w:t>
<xsl:value-of select="INFORMATION"/>
</w:t>
</w:r>
</w:p>
Here is what the xml looks like:
<INFORMATION>
<![CDATA[
<P> line 1 of information
<P> line 2 of information.......]]>
</INFORMATION>
Here is what the word output looks like: (white space and poor formatting)
DIAGNOSIS:
<P> line 1 of information. <P> line 2 of information
I need to be able to somehow render the HTML or strip out the HTML. If I strip out the HTML then I would have to search for every possible HTML element, which is madness! Any help at all would be appreciated...
Thanks.