Is "watermarking" code with random trailing whitespace a good way to detect plagiarism?
- by paperjam
Consider this:
int f(int x)
{
return 2 * x * x;
}
and this
int squareAndDouble(int y)
{
return 2*y*y;
}
If you found these in independent bodies of code, you might give the two programmers the benefit of the doubt and assume they came up with more-or-less the same function independently. But look at the whitespace at the end of each line of code. Same pattern in both. Surely evidence of copying. On a larger piece of code, correlation of random whitespace at line ends would be irrefutable evidence of a shared origin.
Now aside from the obvious weaknesses: e.g. visible or obvious in some editors, easily removed, I was wondering if it was worth deploying something like this in my open source project. My industry has a history of companies ripping off open source projects.