XSLT: moving a grouping html elements into section levels
Posted
by
Jeff
on Stack Overflow
See other posts from Stack Overflow
or by Jeff
Published on 2010-12-28T15:51:53Z
Indexed on
2011/01/12
23:53 UTC
Read the original article
Hit count: 226
Hello there, I'm trying to write an XSLT that organizes an HTML file into different section levels depending on the header level. Here is my input:
<html>
<head>
<title></title>
</head>
<body>
<h1>HEADER 1 CONTENT</h1>
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>
<h2>Header 2 CONTENT</h2>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
</body>
</html>
I'm working with a fairly simple structure at the moment so this pattern will be constant for the time-being. I need an output like this...
<document>
<section level="1">
<header1>Header 1 CONTENT</header1>
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>
<section level="2">
<header2>Header 2 CONTENT</header2>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
</section>
</section>
</document>
I had been working with this example: Stackoverflow Answer
However, I cannot get it to do exactly what I need.
I'm using Saxon 9 to run the xslt within Oxygen for dev. I'll be using a cmd/bat file in production. Still Saxon 9. I'd like to handle up to 4 nested section levels if possible.
Any help is much appreciated!
I need to append onto this as I've encountered another stipulation. I probably should have thought of this before.
I'm encountering the following code sample
<html>
<head>
<title></title>
</head>
<body>
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>
<h1>Header 2 CONTENT</h1>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
</body>
</html>
As you can see, the <p>
is a child of <body>
while in my first snippet, <p>
was always a child of a header level. My desired result is the same as above except that when I encounter <p>
as a child of <body>
, it should be wrapped in <section level="1">
.
<document>
<section level="1">
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>
<p>Level 1 para</p>
</section>
<section level="1">
<header1>Header 2 CONTENT</header1>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
<p>Level 2 para</p>
</section>
</document>
© Stack Overflow or respective owner