Poor performance / speed of regex with lookahead
        Posted  
        
            by 
                Hugo Zaragoza
            
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by Hugo Zaragoza
        
        
        
        Published on 2012-03-22T21:45:13Z
        Indexed on 
            2012/03/24
            5:29 UTC
        
        
        Read the original article
        Hit count: 280
        
I have been observing extremely slow execution times with expressions with several lookaheads.
I suppose that this is due to underlying data structures, but it seems pretty extreme and I wonder if I do something wrong or if there are known work-arounds.
The problem is determining if a set of words are present in a string, in any order. For example we want to find out if two terms "term1" AND "term2" are somewhere in a string. I do this with the expresion:
(?=.*\bterm1\b)(?=.*\bterm2\b)
But what I observe is that this is an order of magnitude slower than checking first just
\bterm1\b
and just then
\bterm2\b
This seems to indicate that I should use an array of patterns instead of a single pattern with lookaheads... is this right? it seems wrong...
Here is an example test code and resulting times:
public static void speedLookAhead() {
    Matcher m, m1, m2;
    boolean find;
    int its = 1000000;
    // create long non-matching string
    char[] str = new char[2000];
    for (int i = 0; i < str.length; i++) {
      str[i] = 'x';
    }
    String test = str.toString();
    // First method: use one expression with lookaheads
    m = Pattern.compile("(?=.*\\bterm1\\b)(?=.*\\bterm2\\b)").matcher(test);
    long time = System.currentTimeMillis();
    ;
    for (int i = 0; i < its; i++) {
      m.reset(test);
      find = m.find();
    }
    time = System.currentTimeMillis() - time;
    System.out.println(time);
    // Second method: use two expressions and AND the results
    m1 = Pattern.compile("\\bterm1\\b").matcher(test);
    m2 = Pattern.compile("\\bterm2\\b").matcher(test);
    time = System.currentTimeMillis();
    ;
    for (int i = 0; i < its; i++) {
      m1.reset(test);
      m2.reset(test);
      find = m1.find() && m2.find();
    }
    time = System.currentTimeMillis() - time;
    System.out.println(time);
  } 
This outputs in my computer:
1754
150
© Stack Overflow or respective owner