NLP - Queries using semantic wildcards in full text searching, maybe with Lucene?
Posted
by
Zsolt
on Programmers
See other posts from Programmers
or by Zsolt
Published on 2012-11-25T23:56:21Z
Indexed on
2012/11/26
5:19 UTC
Read the original article
Hit count: 535
Let's say I have a big corpus (for example in english or an arbitrary language), and I want to perform some semantic search on it. For example I have the query:
"Be careful: [art] armada of [sg] is coming to [do sg]!"
And the corpus contains the following sentence:
"Be careful: an armada of alien ships is coming to destroy our planet!"
It can be seen that my query string could contain "semantic placeholders", such as:
[art] - some placeholder for articles (for example a / an in English) [sg], [do sg] - some placeholders for NPs and VPs (subjects and predicates) I would like to develop a library which would be capable to handle these queries efficiently. I suspect that some kind of POS-tagging would be necessary for parsing the text, but because I don't want to fully reimplement an already existing full-text search engine to make it work, I'm considering that how could I integrate this behaviour into a search engine like Lucene?
I know there are SpanQueries which could behave similarly in some cases, but as I can see, Lucene doesn't do any semantic stuff with stored texts.
It is possible to implement a behavior like this? Or do I have to write an own search engine?
© Programmers or respective owner