Tricky issue with using xslt with badly formed html...

Posted by Ryba on Stack Overflow See other posts from Stack Overflow or by Ryba
Published on 2010-05-21T22:30:49Z Indexed on 2010/05/21 22:50 UTC
Read the original article Hit count: 140

Filed under:
|
|

Hi there, I am fairly new to xslt (2.0) and am having some trouble with a tricky issue. Essentially I have a badly formatted html file like below:

<html>
<body>

<p> text 1 </p>
<div> <p> text 2</p> </div>
<p> Here is a list
    <ul>
        <ol> 
            <li> ListItem1 </li>
        <li> ListItem1 </li>
    </ol>
    <dl>
        <li> dl item </li>
        <li> dl item2 </li>
    </dl>
</ul> 
<div>
<p> I was here</p>
</div>
</p>

And I am trying to put it into a nicely formated XML file. In my xslt file I recursively check if all children of a p or div are other p's or div's and just promote them, other wise I use them as stand alone paragraphs. I extended this idea so that if a p or div with a child list show up properly but don't promote the list children.

A problem that I am having is that the output xml I get is the following

<?xml version="1.0" encoding="utf-8"?><html>
<body>

<p> text 1 </p>
 <p> text 2</p> 
 Here is a list
<ul>
    <ol> 
        <li> ListItem1 </li>
        <li> ListItem1 </li>
    </ol>
    <dl>
        <li> dl item </li>
        <li> dl item2 </li>
    </dl>
</ul> 

<p> I was here</p>

"Here is a list" needs to be in paragraph tags too! I am going crazy trying to solve this ... Any input/links would be greatly appreciated.

© Stack Overflow or respective owner

Related posts about xslt

Related posts about html