Repeatedly querying xml using python

Posted by Jack on Stack Overflow See other posts from Stack Overflow or by Jack
Published on 2010-03-24T12:57:41Z Indexed on 2010/03/24 13:53 UTC
Read the original article Hit count: 338

Filed under:

python

|

elementtree

|

Xml

|

caching

I have some xml documents I need to run queries on. I've created some python scripts (using ElementTree) to do this, since I'm vaguely familiar with using it.

The way it works is I run the scripts several times with different arguments, depending on what I want to find out.

These files can be relatively large (10MB+) and so it takes rather a long time to parse them. On my system, just running:

tree = ElementTree.parse(document)

takes around 30 seconds, with a subsequent findall query only adding around a second to that.

Seeing as the way I'm doing this requires me to repeatedly parse the file, I was wondering if there was some sort of caching mechanism I can use so that the ElementTree.parse computation can be reduced on subsequent queries.

I realise the smart thing to do here may be to try and batch as many queries as possible together in the python script, but I was hoping there might be another way.

Thanks.

© Stack Overflow or respective owner

Related posts about python

unmet dependencies in Ubuntu 12.04

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I tried today to install a dvb-card on my Ubuntu 12.04 (Linux blauhai-linux 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux ). The installation failed with an error. After that, i tried to install python (it was already installed but i got this error): linux:~$… >>> More
How can I get sikuli-ide to work?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I installed sikuli-ide with sudo apt-get install sikuli-ide Everything was fine until I tried to start it from the terminal. I typed sikuli-ide But the only response I got was [info] locale: en_US The application was not started, furthermore there is no desktop file and sikuli-ide does not… >>> More
Getting PATH right for python after MacPorts install

as seen on Super User - Search for 'Super User'
I can't import some python libraries (PIL, psycopg2) that I just installed with MacPorts. I looked through these forums, and tried to adjust my PATH variable in $HOME/.bash_profile in order to fix this but it did not work. I added the location of PIL and psycopg2 to PATH. I know that Terminal is… >>> More
call python with system() in R to run a python script emulating the python console

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to pass a chunk of Python code to Python in R with something like system('python ...'), and I'm wondering if there is an easy way to emulate the python console in this case. For example, suppose the code is "print 'hello world'", how can I get the output like this in R? >>> print… >>> More
Python - Calling a non python program from python?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I am currently struggling to call a non python program from a python script. I have a ~1000 files that when passed through this C++ program will generate ~1000 outputs. Each output file must have a distinct name. The command I wish to run is of the form: program_name -input -output -o1 -o2… >>> More

Related posts about elementtree

parse xml with elementtree, custom sorting

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to parse xml file in utf-8 and sort it by some field. Soring is made by custom alphabet (s1 from sourcecode). History of question is here: sorting of list containing utf-8 charachters. I've found how to sort xml here. Sorting work correctly, the problem is with elementtree, I must admit that… >>> More
: in node causing Keyerror in xmlparsing using ElementTree

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi I'm using ElementTree to parse out an xml feed from Kuler. I'm only beginning in python but am stuck here. The parsing works fine until I attempt to retrieve any nodes containing ':' e.g kuler:swatchHexColor Below is a cut down version of the full feed but same structure: <rss xmlns:xs="http://www… >>> More
Merging elements inside a xml.etree.ElementTree

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a huge test data like the one provided below (and yes I have no control over this data). Each line is actually 6 parts and I need to generate an XML based on this data. Nav;Basic;Dest;Smoke;No;Yes; Nav;Dest;Recent;Regg;No;Yes; Nav;Dest;Favourites;Regg;No;Yes; ... Nav;Dest using on board;By… >>> More
Using SimpleXMLTreeBuilder in elementtree

as seen on Stack Overflow - Search for 'Stack Overflow'
I have been developing an application with django and elementtree and while deploying it to the production server i have found out it is running python 2.4. I have been able to bundle elementtree but now i am getting the error: "No module named expat; use SimpleXMLTreeBuilder instead" Unfortunately… >>> More
Can ElementTree be told to preserve the order of attributes?

as seen on Stack Overflow - Search for 'Stack Overflow'
I've written a fairly simple filter in python using ElementTree to munge the contexts of some xml files. And it works, more or less. But it reorders the attributes of various tags, and I'd like it to not do that. Does anyone know a switch I can throw to make it keep them in specified order? Context… >>> More