Lucene: Fastest way to return the document occurance of a phrase?
Posted
by dont say the kid's name
on Stack Overflow
See other posts from Stack Overflow
or by dont say the kid's name
Published on 2010-05-09T05:00:48Z
Indexed on
2010/05/09
7:58 UTC
Read the original article
Hit count: 228
Hi Guys,
I am trying to use Lucene (actually PyLucene!) to find out how many documents contain my exact phrase. My code currently looks like this... but it runs rather slow. Does anyone know a faster way to return document counts?
phraseList = ["some phrase 1", "some phrase 2"] #etc, a list of phrases...
countsearcher = IndexSearcher(SimpleFSDirectory(File(STORE_DIR)), True)
analyzer = StandardAnalyzer(Version.LUCENE_CURRENT)
for phrase in phraseList:
query = QueryParser(Version.LUCENE_CURRENT, "contents", analyzer).parse("\"" + phrase + "\"")
scoreDocs = countsearcher.search(query, 200).scoreDocs
print "count is: " + str(len(scoreDocs))
© Stack Overflow or respective owner