Fuzzy Search on Material Descriptions including numerical sizes & general descriptions of material t
- by Kyle
We're looking to provide a fuzzy search on an electrical materials database (i.e. conduit, cable, etc.). The problem is that, because of a lack of consistency across all material types, we could not split sizes into separate fields from the text description because some materials are rated by things other than size.
I've attempted a combination of a full text search & a SQL CLR implementation of the Levenshtein search algorithm (for assistance in ranking), but my results are a little funky (i.e. they are not sorting correctly due to improper ranking).
For example, if the search term is "3/4" ABCD Conduit", I'll might get back several irrelevant results in the following order:
1/2" Conduit
1/4" X 3/4" Cable
1/4" Cable Ties
3/4" DFC Conduit Tees
3/4" ABCD Conduit
3/4" Conduit
I believe I've nailed the problem down to the fact that these two search algorithms do not factor in the relevance of punctuation & numeric. That is, in such a search, I'd expect the size to take precedence over any fuzzy match on the rest of the description, but my results don't reflect that.
My question is: Can anyone recommend better search algorithms or different approaches that may be better suited for searching a combination of alphanumerics & punctuation characters?