Backreferences in lookbehind
- by polygenelubricants
Can you use backreferences in a lookbehind?
Let's say I want to split wherever behind me a character is repeated twice.
String REGEX1 = "(?<=(.)\\1)"; // DOESN'T WORK!
String REGEX2 = "(?<=(?=(.)\\1)..)"; // WORKS!
System.out.println(java.util.Arrays.toString(
"Bazooka killed the poor aardvark (yummy!)"
.split(REGEX2)
)); // prints "[Bazoo, ka kill, ed the poo, r aa, rdvark (yumm, y!)]"
Using REGEX2 (where the backreference is in a lookahead nested inside a lookbehind) works, but REGEX1 gives this error at run-time:
Look-behind group does not have an obvious maximum length near index 8
(?<=(.)\1)
^
This sort of make sense, I suppose, because in general the backreference can capture a string of any length (if the regex compiler is a bit smarter, though, it could determine that \1 is (.) in this case, and therefore has a finite length).
So is there a way to use a backreference in a lookbehind?
And if there isn't, can you always work around it using this nested lookahead? Are there other commonly-used techniques?