getting expat to use .dtd for entity replacement in python

Posted by nicolas78 on Stack Overflow See other posts from Stack Overflow or by nicolas78
Published on 2010-05-21T12:26:31Z Indexed on 2010/05/21 12:30 UTC
Read the original article Hit count: 325

Filed under:
|
|
|
|

I'm trying to read in an xml file which looks like this

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp SYSTEM "dblp.dtd">
<dblp>
<incollection>
<author>Jos&eacute; A. Blakeley</author>
</incollection>
</dblp>

The point that creates the problem looks is the

Jos&eacute; A. Blakeley

part: The parser calls its character handler twice, once with "Jos", once with " A. Blakeley". Now I understand this may be the correct behaviour if it doesn't know the eacute entity. However, this is defined in the dblp.dtd, which I have. I don't seem to be able to convince expat to use this file, though. All I can say is

p = xml.parsers.expat.ParserCreate()
# tried with and without following line
p.SetParamEntityParsing(xml.parsers.expat.XML_PARAM_ENTITY_PARSING_ALWAYS) 
p.UseForeignDTD(True)
f = open(dblp_file, "r")
p.ParseFile(f)

but expat still doesn't recognize my entity. Why is there no way to tell expat which DTD to use? I've tried

  • putting the file into the same directory as the XML
  • putting the file into the program's working directory
  • replacing the reference in the xml file by an absolute path

What am I missing? Thx.

© Stack Overflow or respective owner

Related posts about python

Related posts about Xml