Processing XML comments in order using SAX & Cyberneko

Posted by Joel on Stack Overflow See other posts from Stack Overflow or by Joel
Published on 2011-01-15T13:28:51Z Indexed on 2011/01/15 13:54 UTC
Read the original article Hit count: 216

Filed under:
|
|
|
|

I'm using cyberneko to clean and process html documents.

I need to be able to process all the comments that occur in the original html documents.

I've configured the cyberneko sax parser to process comments like so:

parser.setProperty("http://xml.org/sax/properties/lexical-handler", consumer);

...using the same consumer as I am for DOM events.

I get a callback for each of the comments:

 @Override
 public void comment(char[] arg0, int arg1, int arg2) throws SAXException {
  System.out.println("COMMENT::: "+new String(arg0, arg1, arg2));
 }

The problem I have is that all the comments are processed first, out of context of the DOM. i.e. I get a callback for all the comments before the document head, body etc....

What I'd like is for the comment callbacks to occur in the order they occur in the DOM.

Edit: what I'm actually trying to do is parse the instructions for IE in the original html, such as:

 <!--[if lte IE 6]><body class="news ie"><![endif]-->

At the moment they are all dropped, I need to include them in the cleaned HTML document.

© Stack Overflow or respective owner

Related posts about Xml

Related posts about xslt