.NET: Is there a way to finagle a default namespace in an XPath 1.0 query?
- by Cheeso
I'm building a tool that performs xpath 1.0 queries on XHTML documents. The requirement to use a namespace prefix in the query is killing me. The query looks like this:
html/body/div[@class='contents']/div[@class='body']/
div[@class='pgdbbyauthor']/h2[a[@name][starts-with(.,'Quick')]]/
following-sibling::ul[1]/li/a
(all on one line)
...which is bad enough, except because it's xpath 1.0, I need to use an explicit namespace prefix on each QName, so it looks like this:
ns1:html/ns1:body/ns1:div[@class='contents']/ns1:div[@class='body']/
ns1:div[@class='pgdbbyauthor']/ns1:h2[ns1:a[@name][starts-with(.,'Quick')]]/
following-sibling::ns1:ul[1]/ns1:li/ns1:a
To set up the query, I do something like this:
var xpathDoc = new XPathDocument(new StringReader(theText));
var nav = xpathDoc.CreateNavigator();
var xmlns = new XmlNamespaceManager(nav.NameTable);
foreach (string prefix in xmlNamespaces.Keys)
xmlns.AddNamespace(prefix, xmlNamespaces[prefix]);
XPathNodeIterator selection = nav.Select(xpathExpression, xmlns);
But what I want is for the xpathExpression to use the implicit default namespace.
Is there a way for me to transform the unadorned xpath expression, after it's been written, to inject a namespace prefix for each element name in the query?
I'm thinking, anything between two slashes, I could inject a prefix there. Excepting of course axis names like "parent::" and "preceding-sibling::" . And wildcards. That's what I mean by "finagle a default namespace".
Is this hack gonna work?
Addendum
Here's what I mean. suppose I have an xpath expression, and before passing it to nav.Select(), I transform it. Something like this:
string FixupWithDefaultNamespace(string expr)
{
string s = expr;
s = Regex.Replace(s, "^(?!::)([^/:]+)(?=/)", "ns1:$1"); // beginning
s = Regex.Replace(s, "/([^/:]+)(?=/)", "/ns1:$1"); // stanza
s = Regex.Replace(s, "::([A-Za-z][^/:*]*)(?=/)", "::ns1:$1"); // axis specifier
s = Regex.Replace(s, "\\[([A-Za-z][^/:*\\(]*)(?=[\\[\\]])", "[ns1:$1"); // predicate
s = Regex.Replace(s, "/([A-Za-z][^/:]*)(?!<::)$", "/ns1:$1"); // end
s = Regex.Replace(s, "^([A-Za-z][^/:]*)$", "ns1:$1"); // edge case
s = Regex.Replace(s, "([-A-Za-z]+)\\(([^/:\\.,\\)]+)(?=[,\\)])", "$1(ns1:$2"); // xpath functions
return s;
}
This actually works for simple cases I tried. To use the example from above - if the input is the first xpath expression, the output I get is the 2nd one, with all the ns1 prefixes. The real question is, is it hopeless to expect this Regex.Replace approach to work, as the xpath expressions get more complicated?