Managing highly repetitive code and documentation in Java
- by polygenelubricants
Highly repetitive code is generally a bad thing, and there are design patterns that can help minimize this. However, sometimes it's simply inevitable due to the constraints of the language itself. Take the following example from java.util.Arrays:
/**
* Assigns the specified long value to each element of the specified
* range of the specified array of longs. The range to be filled
* extends from index <tt>fromIndex</tt>, inclusive, to index
* <tt>toIndex</tt>, exclusive. (If <tt>fromIndex==toIndex</tt>, the
* range to be filled is empty.)
*
* @param a the array to be filled
* @param fromIndex the index of the first element (inclusive) to be
* filled with the specified value
* @param toIndex the index of the last element (exclusive) to be
* filled with the specified value
* @param val the value to be stored in all elements of the array
* @throws IllegalArgumentException if <tt>fromIndex > toIndex</tt>
* @throws ArrayIndexOutOfBoundsException if <tt>fromIndex < 0</tt> or
* <tt>toIndex > a.length</tt>
*/
public static void fill(long[] a, int fromIndex, int toIndex, long val) {
rangeCheck(a.length, fromIndex, toIndex);
for (int i=fromIndex; i<toIndex; i++)
a[i] = val;
}
The above snippet appears in the source code 8 times, with very little variation in the documentation/method signature but exactly the same method body, one for each of the root array types int[], short[], char[], byte[], boolean[], double[], float[], and Object[].
I believe that unless one resorts to reflection (which is an entirely different subject in itself), this repetition is inevitable. I understand that as a utility class, such high concentration of repetitive Java code is highly atypical, but even with the best practice, repetition does happen! Refactoring doesn't always work because it's not always possible (the obvious case is when the repetition is in the documentation).
Obviously maintaining this source code is a nightmare. A slight typo in the documentation, or a minor bug in the implementation, is multiplied by however many repetitions was made. In fact, the best example happens to involve this exact class:
Google Research Blog - Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken (by Joshua Bloch, Software Engineer)
The bug is a surprisingly subtle one, occurring in what many thought to be just a simple and straightforward algorithm.
// int mid =(low + high) / 2; // the bug
int mid = (low + high) >>> 1; // the fix
The above line appears 11 times in the source code!
So my questions are:
How are these kinds of repetitive Java code/documentation handled in practice? How are they developed, maintained, and tested?
Do you start with "the original", and make it as mature as possible, and then copy and paste as necessary and hope you didn't make a mistake?
And if you did make a mistake in the original, then just fix it everywhere, unless you're comfortable with deleting the copies and repeating the whole replication process?
And you apply this same process for the testing code as well?
Would Java benefit from some sort of limited-use source code preprocessing for this kind of thing?
Perhaps Sun has their own preprocessor to help write, maintain, document and test these kind of repetitive library code?
A comment requested another example, so I pulled this one from Google Collections: com.google.common.base.Predicates lines 276-310 (AndPredicate) vs lines 312-346 (OrPredicate).
The source for these two classes are identical, except for:
AndPredicate vs OrPredicate (each appears 5 times in its class)
"And(" vs Or(" (in the respective toString() methods)
#and vs #or (in the @see Javadoc comments)
true vs false (in apply; ! can be rewritten out of the expression)
-1 /* all bits on */ vs 0 /* all bits off */ in hashCode()
&= vs |= in hashCode()