Algorithm for disordered sequences of strings
        Posted  
        
            by Kinopiko
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by Kinopiko
        
        
        
        Published on 2010-05-18T11:17:18Z
        Indexed on 
            2010/05/18
            11:20 UTC
        
        
        Read the original article
        Hit count: 511
        
algorithm
The Levenshtein distance gives us a way to calculate the distance between two similar strings in terms of disordered individual characters:
quick brown fox quikc brown fax
The Levenshtein distance = 3.
What is a similar algorithm for the distance between two strings with similar subsequences? For example, in
quickbrownfox brownquickfox
the Levenshtein distance is 10, but this takes no account of the fact that the strings have two similar subsequences, which makes them more "similar" than completely disordered words like
quickbrownfox qburiocwknfox
and yet the completely disordered version has a Levenshtein distance of eight.
What distance measures exist which take the length of subsequences into account, without assuming that the subsequences can be easily broken into distinct words?
© Stack Overflow or respective owner