JDOM 1.1: hyphen is not a valid comment character
Posted
by Stefan Kendall
on Stack Overflow
See other posts from Stack Overflow
or by Stefan Kendall
Published on 2010-04-11T17:14:24Z
Indexed on
2010/04/11
17:23 UTC
Read the original article
Hit count: 466
I'm using tagsoup to clean some HTML I'm scraping from the internet, and I'm getting the following error when parsing through pages with comments:
The data "- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - " is not legal for a JDOM comment: Comment data cannot start with a hyphen.
I'm using JDOM 1.1, and here's the code that does the actual cleaning:
SAXBuilder builder = new org.jdom.input.SAXBuilder("org.ccil.cowan.tagsoup.Parser"); // build
// Don't check the doctype! At our usage rate, we'll get 503 responses
// from the w3.
builder.setEntityResolver(dummyEntityResolver);
Reader in = new StringReader(str);
org.jdom.Document doc = builder.build(in);
String cleanXmlDoc = new org.jdom.output.XMLOutputter().outputString(doc);
Any idea what's going wrong, or how to fix this? I need to be able to parse pages with long comment strings of <!--------- data ------------>
© Stack Overflow or respective owner