Automated tests for differencing algorithm
- by Matthew Rodatus
We are designing a differencing algorithm (based on Longest Common Subsequence) that compares a source text and a modified copy to extract the new content (i.e. content that is only in the modified copy). I'm currently compiling a library of test case data.
We need to be able to run automated tests that verify the test cases, but we don't want to verify strict accuracy. Given the heuristic nature of our algorithm, we need our test pass/failures to be fuzzy. We want to specify a threshold of overlap between the desired result and the actual result (i.e. the content that is extracted).
I have a few sketches in my mind as to how to solve this, but has anyone done this before? Does anyone have guidance or ideas about how to do this effectively?