Repeatedly querying xml using python

Posted by Jack on Stack Overflow See other posts from Stack Overflow or by Jack
Published on 2010-03-24T12:57:41Z Indexed on 2010/03/24 13:53 UTC
Read the original article Hit count: 283

Filed under:
|
|
|

I have some xml documents I need to run queries on. I've created some python scripts (using ElementTree) to do this, since I'm vaguely familiar with using it.

The way it works is I run the scripts several times with different arguments, depending on what I want to find out.

These files can be relatively large (10MB+) and so it takes rather a long time to parse them. On my system, just running:

tree = ElementTree.parse(document)

takes around 30 seconds, with a subsequent findall query only adding around a second to that.

Seeing as the way I'm doing this requires me to repeatedly parse the file, I was wondering if there was some sort of caching mechanism I can use so that the ElementTree.parse computation can be reduced on subsequent queries.

I realise the smart thing to do here may be to try and batch as many queries as possible together in the python script, but I was hoping there might be another way.

Thanks.

© Stack Overflow or respective owner

Related posts about python

Related posts about elementtree