Reading UTF-8 XML and writing it to a file with Python
Posted
by Harri
on Stack Overflow
See other posts from Stack Overflow
or by Harri
Published on 2010-06-10T05:52:06Z
Indexed on
2010/06/10
6:02 UTC
Read the original article
Hit count: 399
I'm trying to parse UTF-8 XML file and save some parts of it to another file. Problem is, that this is my first Python script ever and I'm totally confused about the character encoding problems I'm finding.
My script fails immediately when it tries to write non-ascii character to a file, but it can print it to command prompt (at least in some level)
Here's the XML (from the parts that matter at least, it's a *.resx file which contains UI strings)
<?xml version="1.0" encoding="utf-8"?>
<root>
<resheader name="foo">
<value>bar</value>
</resheader>
<data name="lorem" xml:space="preserve">
<value>ipsum öä</value>
</data>
</root>
And here's my python script
from xml.dom.minidom import parse
names = []
values = []
def getStrings(path):
dom = parse(path)
data = dom.getElementsByTagName("data")
for i in range(len(data)):
name = data[i].getAttribute("name")
names.append(name)
value = data[i].getElementsByTagName("value")
values.append(value[0].firstChild.nodeValue.encode("utf-8"))
def writeToFile():
with open("uiStrings-fi.py", "w") as f:
for i in range(len(names)):
line = names[i] + '="'+ values[i] + '"' #varName='varValue'
f.write(line)
f.write("\n")
getStrings("ResourceFile.fi-FI.resx")
writeToFile()
And here's the traceback:
Traceback (most recent call last): File "GenerateLanguageFiles.py", line 24, in writeToFile() File "GenerateLanguageFiles.py", line 19, in writeToFile line = names[i] + '="'+ values[i] + '"' #varName='varValue' UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in ran ge(128)
How should I fix my script so it would read and write UTF-8 characters properly? The files I'm trying to generate would be used in test automation with Robots Framework.
© Stack Overflow or respective owner