Reading UTF-8 XML and writing it to a file with Python
- by Harri
I'm trying to parse UTF-8 XML file and save some parts of it to another file. Problem is, that this is my first Python script ever and I'm totally confused about the character encoding problems I'm finding.
My script fails immediately when it tries to write non-ascii character to a file, but it can print it to command prompt (at least in some level)
Here's the XML (from the parts that matter at least, it's a *.resx file which contains UI strings)
<?xml version="1.0" encoding="utf-8"?>
<root>
<resheader name="foo">
<value>bar</value>
</resheader>
<data name="lorem" xml:space="preserve">
<value>ipsum öä</value>
</data>
</root>
And here's my python script
from xml.dom.minidom import parse
names = []
values = []
def getStrings(path):
dom = parse(path)
data = dom.getElementsByTagName("data")
for i in range(len(data)):
name = data[i].getAttribute("name")
names.append(name)
value = data[i].getElementsByTagName("value")
values.append(value[0].firstChild.nodeValue.encode("utf-8"))
def writeToFile():
with open("uiStrings-fi.py", "w") as f:
for i in range(len(names)):
line = names[i] + '="'+ values[i] + '"' #varName='varValue'
f.write(line)
f.write("\n")
getStrings("ResourceFile.fi-FI.resx")
writeToFile()
And here's the traceback:
Traceback (most recent call last):
File "GenerateLanguageFiles.py", line 24, in
writeToFile()
File "GenerateLanguageFiles.py", line 19, in writeToFile
line = names[i] + '="'+ values[i] + '"' #varName='varValue'
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in ran
ge(128)
How should I fix my script so it would read and write UTF-8 characters properly? The files I'm trying to generate would be used in test automation with Robots Framework.