Improve XPath efficiency for repeated, parameterized queries
- by Chris Allan
Hi,
I am repeatedly performing the following XPath query (though parameterized by 'keywordText') around 40,000 times:
String query = SystemGlobal.YAHOO_KEYWORDSSUBNODE + "/" + SystemGlobal.YAHOO_KEYWORDNODE + "[" + SystemGlobal.YAHOO_ATTRKEYPHRASE + "='" + keywordText + "']";
CachedXPathAPI cachedXPathAPI = new CachedXPathAPI();
NodeIterator nl = cachedXPathAPI.selectNodeIterator(doc.getElementsByTagName(SystemGlobal.YAHOO_KEYWORDSROOT).item(0), query);
Node n;
if ((n = nl.nextNode()) != null) {
keyword.setKeywordId(Long.parseLong(cachedXPathAPI.selectSingleNode(n, SystemGlobal.YAHOO_ATTRKEYID).getTextContent()));
keyword.setKeyPhrase(cachedXPathAPI.selectSingleNode(n, SystemGlobal.YAHOO_ATTRKEYPHRASE).getTextContent());
keyword.setStatus(mapStatus(cachedXPathAPI.selectSingleNode(n, SystemGlobal.YAHOO_ATTRSTATUS).getTextContent()));
keyword.setCampaignId(Long.parseLong(cachedXPathAPI.selectSingleNode(n, "../../" + SystemGlobal.YAHOO_ATTRCAMPAIGNID).getTextContent()));
keyword.setAdGroupId(Long.parseLong(cachedXPathAPI.selectSingleNode(n, "../" + SystemGlobal.YAHOO_ATTRADGROUPID).getTextContent()));
On the first run of the script, all 40,000 runs of this piece of code will have nl.nextNode() == null, and everything runs quite quickly. However, on the following runs, when nl.nextNode() != null, then things slow down a lot - this takes around an additional 40min to run (whereas the first run takes maybe 1 minute).
Oh, and the doc is constructed like so:
InputSource in = new InputSource(new FileInputStream(filename));
DocumentBuilderFactory dfactory = DocumentBuilderFactory.newInstance();
dfactory.setNamespaceAware(true);
doc = dfactory.newDocumentBuilder().parse(in);
I tried including the following lines
reportEvaluator = new XPathEvaluatorImpl(reportDoc);
reportResolver = reportEvaluator.createNSResolver(reportDoc);
and rather creating a NodeIterator, instead creating an XPathResult:
XPathResult result = (XPathResult)reportEvaluator.evaluate(query, doc.getElementsByTagName(SystemGlobal.YAHOO_KEYWORDSROOT).item(0), reportResolver, XPathResult.UNORDERED_NODE_ITERATOR_TYPE, null);
however this ran even slower
Is there a way in which I can speed up the running of this script? I have seen references to precompiled queries, though I haven't seen many actual details. Also, as seen in the code, I am using CachedXPathAPI, though the benefit for this case is not so great.
Any help is much appreciated!
Chris Allan