Search Lucene with precise edit distances
- by askullhead
I would like to search a Lucene index with edit distances. For example, say, there is a document with a field FIRST_NAME; I want all documents with first names that are 1 edit distance away from, say, 'john'.
I know that Lucene supports fuzzy searches (FIRST_NAME:john~) and takes a number between 0 and 1 to control the fuzziness. The problem (for me) is this number does not directly translate to an edit distance. And when the values in the documents are short strings (less than 3 characters) the fuzzy search has difficulty finding them. For example if there is a document with FIRST_NAME 'J' and I search for FIRST_NAME:I~0.0 I don't get anything back.