fastest way to perform string search in general and in python
Posted
by
Rkz
on Stack Overflow
See other posts from Stack Overflow
or by Rkz
Published on 2012-10-28T04:36:32Z
Indexed on
2012/10/28
5:00 UTC
Read the original article
Hit count: 126
My task is to search for a string or a pattern in a list of documents that are very short (say 200 characters long). However, say there are 1 million documents of such time. What is the most efficient way to perform this search?. I was thinking of tokenizing each document and putting the words in hashtable with words as key and document number as value, there by creating a bag of words. Then perform the word search and retrieve the list of documents that contained this word. From what I can see is this operation will take O(n) operations. Is there any other way? may be without using hash-tables?.
Also, is there a python library or third party package that can perform efficient searches?
© Stack Overflow or respective owner