Reading UTF-8 XML and writing it to a file with Python

Posted by Harri on Stack Overflow See other posts from Stack Overflow or by Harri
Published on 2010-06-10T05:52:06Z Indexed on 2010/06/10 6:02 UTC
Read the original article Hit count: 393

Filed under:
|
|

I'm trying to parse UTF-8 XML file and save some parts of it to another file. Problem is, that this is my first Python script ever and I'm totally confused about the character encoding problems I'm finding.

My script fails immediately when it tries to write non-ascii character to a file, but it can print it to command prompt (at least in some level)

Here's the XML (from the parts that matter at least, it's a *.resx file which contains UI strings)

<?xml version="1.0" encoding="utf-8"?>
<root>
     <resheader name="foo">
          <value>bar</value>
     </resheader>
     <data name="lorem" xml:space="preserve">
          <value>ipsum öä</value>
     </data>
</root>

And here's my python script

from xml.dom.minidom import parse

names = []
values = []

def getStrings(path):
    dom = parse(path)
    data = dom.getElementsByTagName("data")

    for i in range(len(data)):
        name = data[i].getAttribute("name")
        names.append(name)
        value = data[i].getElementsByTagName("value")
        values.append(value[0].firstChild.nodeValue.encode("utf-8"))

def writeToFile():
    with open("uiStrings-fi.py", "w") as f:
        for i in range(len(names)):
            line = names[i] + '="'+ values[i] + '"' #varName='varValue'
            f.write(line)
            f.write("\n")

getStrings("ResourceFile.fi-FI.resx")
writeToFile()

And here's the traceback:

Traceback (most recent call last):
  File "GenerateLanguageFiles.py", line 24, in 
    writeToFile()
  File "GenerateLanguageFiles.py", line 19, in writeToFile
    line = names[i] + '="'+ values[i] + '"' #varName='varValue'
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in ran
ge(128)

How should I fix my script so it would read and write UTF-8 characters properly? The files I'm trying to generate would be used in test automation with Robots Framework.

© Stack Overflow or respective owner

Related posts about python

Related posts about Xml