XSLT: moving a grouping html elements into section levels

Posted by Jeff on Stack Overflow See other posts from Stack Overflow or by Jeff
Published on 2010-12-28T15:51:53Z Indexed on 2011/01/12 23:53 UTC
Read the original article Hit count: 226

Filed under:
|
|
|

Hello there, I'm trying to write an XSLT that organizes an HTML file into different section levels depending on the header level. Here is my input:

<html>
 <head>
  <title></title>
 </head>
 <body>
  <h1>HEADER 1 CONTENT</h1>
  <p>Level 1 para</p>
  <p>Level 1 para</p>
  <p>Level 1 para</p>
  <p>Level 1 para</p>

  <h2>Header 2 CONTENT</h2>
  <p>Level 2 para</p>
  <p>Level 2 para</p>
  <p>Level 2 para</p>
  <p>Level 2 para</p>
 </body>
</html>

I'm working with a fairly simple structure at the moment so this pattern will be constant for the time-being. I need an output like this...

<document> 
  <section level="1">
     <header1>Header 1 CONTENT</header1>
     <p>Level 1 para</p>
     <p>Level 1 para</p>
     <p>Level 1 para</p>
     <p>Level 1 para</p>
     <section level="2">
        <header2>Header 2 CONTENT</header2>
        <p>Level 2 para</p>
        <p>Level 2 para</p>
        <p>Level 2 para</p>
        <p>Level 2 para</p>
     </section>
  </section>
</document>

I had been working with this example: Stackoverflow Answer

However, I cannot get it to do exactly what I need.

I'm using Saxon 9 to run the xslt within Oxygen for dev. I'll be using a cmd/bat file in production. Still Saxon 9. I'd like to handle up to 4 nested section levels if possible.

Any help is much appreciated!

I need to append onto this as I've encountered another stipulation. I probably should have thought of this before.

I'm encountering the following code sample

<html>
<head>
<title></title>
</head>
<body>
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>

<h1>Header 2 CONTENT</h1>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
</body>
</html>

As you can see, the <p> is a child of <body> while in my first snippet, <p> was always a child of a header level. My desired result is the same as above except that when I encounter <p> as a child of <body>, it should be wrapped in <section level="1">.

<document> 
<section level="1">     
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>
</section>
<section level="1">
<header1>Header 2 CONTENT</header1>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
</section>
</document>

© Stack Overflow or respective owner

Related posts about html

Related posts about xslt