Improve XPath efficiency for repeated, parameterized queries

Posted by Chris Allan on Stack Overflow See other posts from Stack Overflow or by Chris Allan
Published on 2010-04-07T16:07:50Z Indexed on 2010/04/07 16:13 UTC
Read the original article Hit count: 692

Filed under:

Hi,

I am repeatedly performing the following XPath query (though parameterized by 'keywordText') around 40,000 times:

String query = SystemGlobal.YAHOO_KEYWORDSSUBNODE + "/" + SystemGlobal.YAHOO_KEYWORDNODE + "[" + SystemGlobal.YAHOO_ATTRKEYPHRASE + "='" + keywordText + "']";
CachedXPathAPI cachedXPathAPI = new CachedXPathAPI();
NodeIterator nl = cachedXPathAPI.selectNodeIterator(doc.getElementsByTagName(SystemGlobal.YAHOO_KEYWORDSROOT).item(0), query);

Node n;
if ((n = nl.nextNode()) != null) {
  keyword.setKeywordId(Long.parseLong(cachedXPathAPI.selectSingleNode(n, SystemGlobal.YAHOO_ATTRKEYID).getTextContent()));
  keyword.setKeyPhrase(cachedXPathAPI.selectSingleNode(n, SystemGlobal.YAHOO_ATTRKEYPHRASE).getTextContent());
  keyword.setStatus(mapStatus(cachedXPathAPI.selectSingleNode(n, SystemGlobal.YAHOO_ATTRSTATUS).getTextContent()));
  keyword.setCampaignId(Long.parseLong(cachedXPathAPI.selectSingleNode(n, "../../" + SystemGlobal.YAHOO_ATTRCAMPAIGNID).getTextContent()));
  keyword.setAdGroupId(Long.parseLong(cachedXPathAPI.selectSingleNode(n, "../" + SystemGlobal.YAHOO_ATTRADGROUPID).getTextContent()));

On the first run of the script, all 40,000 runs of this piece of code will have nl.nextNode() == null, and everything runs quite quickly. However, on the following runs, when nl.nextNode() != null, then things slow down a lot - this takes around an additional 40min to run (whereas the first run takes maybe 1 minute).

Oh, and the doc is constructed like so:

InputSource in = new InputSource(new FileInputStream(filename));
DocumentBuilderFactory dfactory = DocumentBuilderFactory.newInstance();
dfactory.setNamespaceAware(true);
doc = dfactory.newDocumentBuilder().parse(in);

I tried including the following lines

reportEvaluator = new XPathEvaluatorImpl(reportDoc);
reportResolver = reportEvaluator.createNSResolver(reportDoc);

and rather creating a NodeIterator, instead creating an XPathResult:

XPathResult result = (XPathResult)reportEvaluator.evaluate(query, doc.getElementsByTagName(SystemGlobal.YAHOO_KEYWORDSROOT).item(0), reportResolver, XPathResult.UNORDERED_NODE_ITERATOR_TYPE, null);

however this ran even slower

Is there a way in which I can speed up the running of this script? I have seen references to precompiled queries, though I haven't seen many actual details. Also, as seen in the code, I am using CachedXPathAPI, though the benefit for this case is not so great.

Any help is much appreciated!

Chris Allan

© Stack Overflow or respective owner

Related posts about xpath