Repeatedly querying xml using python
- by Jack
I have some xml documents I need to run queries on. I've created some python scripts (using ElementTree) to do this, since I'm vaguely familiar with using it.
The way it works is I run the scripts several times with different arguments, depending on what I want to find out.
These files can be relatively large (10MB+) and so it takes rather a long time to parse them. On my system, just running:
tree = ElementTree.parse(document)
takes around 30 seconds, with a subsequent findall query only adding around a second to that.
Seeing as the way I'm doing this requires me to repeatedly parse the file, I was wondering if there was some sort of caching mechanism I can use so that the ElementTree.parse computation can be reduced on subsequent queries.
I realise the smart thing to do here may be to try and batch as many queries as possible together in the python script, but I was hoping there might be another way.
Thanks.