Encoding in python with lxml - complex solution

Posted by Vojtech R. on Stack Overflow See other posts from Stack Overflow or by Vojtech R.
Published on 2010-04-21T21:30:02Z Indexed on 2010/04/21 21:33 UTC
Read the original article Hit count: 521

Filed under:

lxml

|

python

Hi,

I need to download and parse webpage with lxml and build UTF-8 xml output. I thing schema in pseudocode is more illustrative:

from lxml import etree

webfile = urllib2.urlopen(url)
root = etree.parse(webfile.read(), parser=etree.HTMLParser(recover=True))

txt = my_process_text(etree.tostring(root.xpath('/html/body'), encoding=utf8))


output = etree.Element("out")
output.text = txt

outputfile.write(etree.tostring(output, encoding=utf8))

So webfile can be in any encoding (lxml should handle this). Outputfile have to be in utf-8. I'm not sure where to use encoding/coding. Is this schema ok? (I cant find good tutorial about lxml and encoding, but I can find many problems with this...) I need robust approved solution so I ask you seniors.

Many thanks

© Stack Overflow or respective owner

Related posts about lxml

how to pass an xml file to lxml to parse?

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm trying to parse an xml file using lxml. xml.etree allowed me to simply pass the file name as a parameter to the parse function, so I attempted to do the same with lxml. My code: from lxml import etree from lxml import objectify file = "C:\Projects\python\cb.xml" tree = etree.parse(file) but… >>> More
Python question, how to pass an xml file to lxml to parse?

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm relatively new to python, my code is: from lxml import etree from lxml import objectify file = "C:\Projects\python\cb.xml" tree = etree.parse(file) but I get the error: Traceback (most recent call last): File "cb.py", line 5, in <module> tree = etree.parse(file) File "lxml.etree… >>> More
Installing both lxml 3.1.2 and lxml2 on ubuntu 12.04

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I asked this on SO: http://stackoverflow.com/questions/19852911/lxml-3-1-2-and-lxml2-both-on-ubuntu/19856674#19856674 But it is perhaps more appropriate for AskUbuntu. So here it is again, reformulated. On the lxml site they suggest that it is possible to have both lxml2 and the newest version of… >>> More
Default or fink python and lxml under 10.6.8

as seen on Super User - Search for 'Super User'
Ah, confusion. I'm trying to install a python library called lxml as needed by a python script. I've been through numerous SU quesitons and answers. I haven't been able to make much progress. I run easy_install lxml and get: Processing lxml-3.0.1-py2.6-macosx-10.6-universal.egg lxml 3.0.1 is … >>> More
LXML E builder for java?

as seen on Stack Overflow - Search for 'Stack Overflow'
There is one thing I really love about LXML, and that the E builder. I love that I can throw XML together like this: message = E.Person( E.Name( E.First("jack") E.Last("Ripper") ) E.PhoneNumber("555-555-5555") ) To make: <Person> <Name> <First>Jack</First> … >>> More

Related posts about python

unmet dependencies in Ubuntu 12.04

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I tried today to install a dvb-card on my Ubuntu 12.04 (Linux blauhai-linux 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux ). The installation failed with an error. After that, i tried to install python (it was already installed but i got this error): linux:~$… >>> More
How can I get sikuli-ide to work?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I installed sikuli-ide with sudo apt-get install sikuli-ide Everything was fine until I tried to start it from the terminal. I typed sikuli-ide But the only response I got was [info] locale: en_US The application was not started, furthermore there is no desktop file and sikuli-ide does not… >>> More
Getting PATH right for python after MacPorts install

as seen on Super User - Search for 'Super User'
I can't import some python libraries (PIL, psycopg2) that I just installed with MacPorts. I looked through these forums, and tried to adjust my PATH variable in $HOME/.bash_profile in order to fix this but it did not work. I added the location of PIL and psycopg2 to PATH. I know that Terminal is… >>> More
call python with system() in R to run a python script emulating the python console

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to pass a chunk of Python code to Python in R with something like system('python ...'), and I'm wondering if there is an easy way to emulate the python console in this case. For example, suppose the code is "print 'hello world'", how can I get the output like this in R? >>> print… >>> More
Python - Calling a non python program from python?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am currently struggling to call a non python program from a python script. I have a ~1000 files that when passed through this C++ program will generate ~1000 outputs. Each output file must have a distinct name. The command I wish to run is of the form: program_name -input -output -o1 -o2… >>> More