Repeatedly querying xml using python
Posted
by Jack
on Stack Overflow
See other posts from Stack Overflow
or by Jack
Published on 2010-03-24T12:57:41Z
Indexed on
2010/03/24
13:53 UTC
Read the original article
Hit count: 283
I have some xml documents I need to run queries on. I've created some python scripts (using ElementTree) to do this, since I'm vaguely familiar with using it.
The way it works is I run the scripts several times with different arguments, depending on what I want to find out.
These files can be relatively large (10MB+) and so it takes rather a long time to parse them. On my system, just running:
tree = ElementTree.parse(document)
takes around 30 seconds, with a subsequent findall query only adding around a second to that.
Seeing as the way I'm doing this requires me to repeatedly parse the file, I was wondering if there was some sort of caching mechanism I can use so that the ElementTree.parse computation can be reduced on subsequent queries.
I realise the smart thing to do here may be to try and batch as many queries as possible together in the python script, but I was hoping there might be another way.
Thanks.
© Stack Overflow or respective owner