A case-insensitive related implementation problem
- by Robert
Hi All,
I am going through a final refinement posted by the client, which needs me to do a case-insesitive query. I will basically walk through how this simple program works.
First of all, in my Java class, I did a fairly simple webpage parsing:
title=(String)results.get("title");
doc = docBuilder.parse("http://" + server + ":" + port + "/exist/rest/db/wb/xql/media_lookup.xql?" + "&title=" + title);
This Java statement references an XQuery file "media_lookup.xql" which is stored on localhost, and the only parameter we are passing is the string "title".
Secondly, let's take at look at that XQuery file:
$title := request:get-parameter('title',""),
$mediaNodes := doc('/db/wb/portfolio/media_data.xml'),
$query := $mediaNodes//media[contains(title,$title)],
Then it will evaluate that query. This XQuery will get the "title" parameter that are passes from our Java class, and query the "media_data" xml file stored in the database, which contains a bunch of media nodes with a 'title' element node. As you may expect, this simple query will just match those media nodes whose 'title' element contains a substring of what the value of string 'title' is. So if our 'title' is "Chi", it will return media nodes whose title may be "Chicago" or "Chicken".
The refinment request posted by the client is that there should be NO case-sensitivity. The very intuitive way is to modify the XQuery statement by using a lower-case funtion in it, like:
$query := $mediaNodes//media[contains(lower-case(title/text(),lower-case($title))],
However, the question comes: this modified query will run my machine into memory overflow. Since my "media_data.xml" is quite huge and contains thouands of millions of media nodes,
I assume the lower-case() function will run on each of the entries, thus causing the machine to crash.
I've talked with some experienced XQuery programmer, and they think I should use an index to solve this problem, and I will definitely research into that. But before that, I am just posting this problem here to get other ideas or any suggestions, do you think any other way may help? for example, could I tweak the Java parse statement to realize the case-insensitivity? Since I think I saw some people did some string concatination by using "contains." in Java before passing it to the server.
Any idea or help is welcomed, thanks in advance.