Ignoring "Content is not allowed in trailing section" SAXException
- by Paul J. Lucas
I'm using Java's DocumentBuilder.parse(InputStream) to parse an XML document. Occasionally, I get malformed XML documents in that there is extra junk after the final > that causes a SAXException: Content is not allowed in trailing section. (In the cases I've seen, the junk is simply one or more null bytes.)
I don't care what's after the final >. Is there an easy way to parse an entire XML document in Java and have it ignore any trailing junk?
Note that by "ignore" I don't simply mean to catch and ignore the exception: I mean to ignore the trailing junk, throw no exception, and to return the Document object since the XML up to an including the final > is valid.