fastest way to perform string search in general and in python
- by Rkz
My task is to search for a string or a pattern in a list of documents that are very short (say 200 characters long). However, say there are 1 million documents of such time. What is the most efficient way to perform this search?. I was thinking of tokenizing each document and putting the words in hashtable with words as key and document number as value, there by creating a bag of words. Then perform the word search and retrieve the list of documents that contained this word. From what I can see is this operation will take O(n) operations. Is there any other way? may be without using hash-tables?.
Also, is there a python library or third party package that can perform efficient searches?