Algorithm for measuring distance between disordered sequences

Posted by Kinopiko on Stack Overflow See other posts from Stack Overflow or by Kinopiko
Published on 2010-05-18T11:17:18Z Indexed on 2010/05/18 15:01 UTC
Read the original article Hit count: 447

Filed under:

The Levenshtein distance gives us a way to calculate the distance between two similar strings in terms of disordered individual characters:

quick brown fox
quikc brown fax

The Levenshtein distance = 3.

What is a similar algorithm for the distance between two strings with similar subsequences? For example, in

quickbrownfox
brownquickfox

the Levenshtein distance is 10, but this takes no account of the fact that the strings have two similar subsequences, which makes them more "similar" than completely disordered words like

quickbrownfox
qburiocwknfox

and yet this completely disordered version has a Levenshtein distance of eight.

What distance measures exist which take the length of subsequences into account, without assuming that the subsequences can be easily broken into distinct words?

© Stack Overflow or respective owner

Related posts about algorithm