Insert a doctype into an XML document (Java/ SAX)

Posted by Thom Nichols on Stack Overflow See other posts from Stack Overflow or by Thom Nichols
Published on 2010-04-14T23:35:37Z Indexed on 2010/04/14 23:43 UTC
Read the original article Hit count: 627

Filed under:
|
|

Imagine you have an XML document and imagine you have the DTD but the document itself doesn't actually specify a DOCTYPE ... How would you insert the DOCTYPE declaration, preferably by specifying it on the parser (similar to how you can set the schema for a document that will be parsed) or by inserting the necessary SAX events via an XMLFilter or the like?

I've found many references to EntityResolver, but that is what's invoked once a DOCTYPE is found during parsing and it's used to point to a local DTD file. EntityResolver2 appears to have what I'm looking for but I haven't found any examples of usage.

This is the closest I've come thus far: (code is Groovy, but close enough that you should be able to understand it...)

import org.xml.sax.*
import org.xml.sax.ext.*
import org.xml.sax.helpers.*

class XmlFilter extends XMLFilterImpl {
    public XmlFilter( XMLReader reader ) { super(reader) }

    @Override public void startDocument() {
        super.startDocument()        
        super.resolveEntity( null, 
            'file:///./entity.dtd')
        println "filter startDocument"
    }
}

class MyHandler extends DefaultHandler2 { 
    public InputSource resolveEntity(String name, String publicId, String baseURI, String systemId) {
        println "entity: $name, $publicId, $baseURI, $systemId"
        return new InputSource(new StringReader('<!ENTITY asdf "&#161;">'))
    }
}

def handler = new MyHandler()

def parser = XMLReaderFactory.createXMLReader()
parser.setFeature 'http://xml.org/sax/features/use-entity-resolver2', true
def filter = new XmlFilter( parser )
filter.setContentHandler( handler )
filter.setEntityResolver( handler )

filter.parse( new InputSource(new StringReader('''<?xml version="1.0" ?>
    <test>one &asdf; two! &nbsp; &iexcl;&pound;&cent;</test>''')) );

I see resolveEntity called but still hit

org.xml.sax.SAXParseException: The entity "asdf" was referenced, but not declared.
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
at org.xml.sax.helpers.XMLFilterImpl.parse(XMLFilterImpl.java:333)

I guess this is because there's no way to add SAX events that the parser knows about, I can only add events via a filter that's upstream from the parser which are passed along to the ContentHandler. So the document has to be valid going into the XMLReader. Any way around this? I know I can modify the raw stream to add a doctype or possibly do a transform to set a DTD... Any other options?

© Stack Overflow or respective owner

Related posts about java

Related posts about sax