Efficient mass string search problem.

Posted by Monomer on Stack Overflow See other posts from Stack Overflow or by Monomer
Published on 2010-03-28T08:10:36Z Indexed on 2010/03/28 8:13 UTC
Read the original article Hit count: 324

Filed under:
|
|
|
|

The Problem: A large static list of strings is provided. A pattern string comprised of data and wildcard elements (* and ?). The idea is to return all the strings that match the pattern - simple enough.

Current Solution: I'm currently using a linear approach of scanning the large list and globbing each entry against the pattern.

My Question: Are there any suitable data structures that I can store the large list into such that the search's complexity is less than O(n)?

Perhaps something akin to a suffix-trie? I've also considered using bi- and tri-grams in a hashtable, but the logic required in evaluating a match based on a merge of the list of words returned and the pattern is a nightmare, and I'm not convinced its the correct approach.

© Stack Overflow or respective owner

Related posts about string

Related posts about search