Cleaning an XML file in Python before parsing

Posted by Sam on Stack Overflow See other posts from Stack Overflow or by Sam
Published on 2010-03-30T14:02:59Z Indexed on 2010/03/30 14:23 UTC
Read the original article Hit count: 367

Filed under:
|
|

I'm using minidom to parse an xml file and it threw an error indicating that the data is not well formed. I figured out that some of the pages have characters like ไอเฟล &, causing the parser to hiccup. Is there an easy way to clean the file before I start parsing it? Right now I'm using a regular expressing to throw away anything that isn't an alpha numeric character and the </> characters, but it isn't quite working.

© Stack Overflow or respective owner

Related posts about python

Related posts about Xml