Java regex skipping matches

Posted by Mihail Burduja on Stack Overflow See other posts from Stack Overflow or by Mihail Burduja
Published on 2012-11-11T10:42:57Z Indexed on 2012/11/11 11:00 UTC
Read the original article Hit count: 185

Filed under:
|

I have some text; I want to extract pairs of words that are not separated by punctuation. Thi is the code:

//n-grams
Pattern p = Pattern.compile("[a-z]+");
if (n == 2) {
    p = Pattern.compile("[a-z]+ [a-z]+");
}
if (n == 3) {
    p = Pattern.compile("[a-z]+ [a-z]+ [a-z]+");
}
Matcher m = p.matcher(text.toLowerCase());
ArrayList<String> result = new ArrayList<String>();

while (m.find()) {
    String temporary = m.group();
    System.out.println(temporary);

    result.add(temporary);
}

The problem is that it skips some matches. For example "My name is James", for n = 3, must match "my name is" and "name is james", but instead it matches just the first. Is there a way to solve this?

© Stack Overflow or respective owner

Related posts about java

Related posts about regex