Efficient mass string search problem.
Posted
by Monomer
on Stack Overflow
See other posts from Stack Overflow
or by Monomer
Published on 2010-03-28T08:10:36Z
Indexed on
2010/03/28
8:13 UTC
Read the original article
Hit count: 329
The Problem: A large static list of strings is provided. A pattern string comprised of data and wildcard elements (* and ?). The idea is to return all the strings that match the pattern - simple enough.
Current Solution: I'm currently using a linear approach of scanning the large list and globbing each entry against the pattern.
My Question: Are there any suitable data structures that I can store the large list into such that the search's complexity is less than O(n)?
Perhaps something akin to a suffix-trie? I've also considered using bi- and tri-grams in a hashtable, but the logic required in evaluating a match based on a merge of the list of words returned and the pattern is a nightmare, and I'm not convinced its the correct approach.
© Stack Overflow or respective owner