Optimizing Levenshtein Distance Algorithm
        Posted  
        
            by Matt
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by Matt
        
        
        
        Published on 2010-05-27T05:54:19Z
        Indexed on 
            2010/05/27
            11:41 UTC
        
        
        Read the original article
        Hit count: 631
        
I have a stored procedure that uses Levenshtein Distance to determine the result closest to what the user typed. The only thing really affecting the speed is the function that calculates the Levenshtein Distance for all the records before selecting the record with the lowest distance (I've verified this by putting a 0 in place of the call to the Levenshtein function). The table has 1.5 million records, so even the slightest adjustment may shave off a few seconds. Right now the entire thing runs over 10 minutes. Here's the method I'm using:
ALTER function dbo.Levenshtein
( 
    @Source nvarchar(200), 
    @Target nvarchar(200) 
) 
RETURNS int
AS
BEGIN
DECLARE @Source_len int, @Target_len int, @i int, @j int, @Source_char nchar, @Dist int, @Dist_temp int, @Distv0 varbinary(8000), @Distv1 varbinary(8000)
SELECT @Source_len = LEN(@Source), @Target_len = LEN(@Target), @Distv1 = 0x0000, @j = 1, @i = 1, @Dist = 0
WHILE @j <= @Target_len
BEGIN
    SELECT @Distv1 = @Distv1 + CAST(@j AS binary(2)), @j = @j + 1
END
WHILE @i <= @Source_len
BEGIN
    SELECT @Source_char = SUBSTRING(@Source, @i, 1), @Dist = @i, @Distv0 = CAST(@i AS binary(2)), @j = 1
WHILE @j <= @Target_len
BEGIN
    SET @Dist = @Dist + 1
    SET @Dist_temp = CAST(SUBSTRING(@Distv1, @j+@j-1, 2) AS int) +
                  CASE WHEN @Source_char = SUBSTRING(@Target, @j, 1) THEN 0 ELSE 1 END
    IF @Dist > @Dist_temp
    BEGIN
        SET @Dist = @Dist_temp
    END
    SET @Dist_temp = CAST(SUBSTRING(@Distv1, @j+@j+1, 2) AS int)+1
    IF @Dist > @Dist_temp SET @Dist = @Dist_temp
    BEGIN
        SELECT @Distv0 = @Distv0 + CAST(@Dist AS binary(2)), @j = @j + 1
    END
END
SELECT @Distv1 = @Distv0, @i = @i + 1
END
RETURN @Dist
END
Anyone have any ideas? Any input is appreciated.
Thanks,
Matt
© Stack Overflow or respective owner