Backreferences in lookbehind

Posted by polygenelubricants on Stack Overflow See other posts from Stack Overflow or by polygenelubricants
Published on 2010-04-29T05:34:38Z Indexed on 2010/04/29 5:37 UTC
Read the original article Hit count: 420

Filed under:
|
|
|

Can you use backreferences in a lookbehind?

Let's say I want to split wherever behind me a character is repeated twice.

    String REGEX1 = "(?<=(.)\\1)"; // DOESN'T WORK!
    String REGEX2 = "(?<=(?=(.)\\1)..)"; // WORKS!

    System.out.println(java.util.Arrays.toString(
        "Bazooka killed the poor aardvark (yummy!)"
        .split(REGEX2)
    )); // prints "[Bazoo, ka kill, ed the poo, r aa, rdvark (yumm, y!)]"

Using REGEX2 (where the backreference is in a lookahead nested inside a lookbehind) works, but REGEX1 gives this error at run-time:

Look-behind group does not have an obvious maximum length near index 8
(?<=(.)\1)
        ^

This sort of make sense, I suppose, because in general the backreference can capture a string of any length (if the regex compiler is a bit smarter, though, it could determine that \1 is (.) in this case, and therefore has a finite length).

So is there a way to use a backreference in a lookbehind?

And if there isn't, can you always work around it using this nested lookahead? Are there other commonly-used techniques?

© Stack Overflow or respective owner

Related posts about java

Related posts about regex