Shortest Common Superstring: find shortest string that contains all given string fragments
Posted
by
occulus
on Programmers
See other posts from Programmers
or by occulus
Published on 2012-09-25T10:52:46Z
Indexed on
2012/09/25
15:49 UTC
Read the original article
Hit count: 306
algorithms
|strings
Given some string fragments, I would like to find the shortest possible single string ("output string") that contains all the fragments. Fragments can overlap each other in the output string.
Example:
For the string fragments:
BCDA
AGF
ABC
The following output string contains all fragments, and was made by naive appending:
BCDAAGFABC
However this output string is better (shorter), as it employs overlaps:
ABCDAGF
^
ABC
^
BCDA
^
AGF
I'm looking for algorithms for this problem. It's not absolutely important to find the strictly shortest output string, but the shorter the better. I'm looking for an algorithm better than the obvious naive one that would try appending all permutations of the input fragments and removing overlaps (which would appear to be NP-Complete).
I've started work on a solution and it's proving quite interesting; I'd like to see what other people might come up with. I'll add my work-in-progress to this question in a while.
© Programmers or respective owner