Java regex skipping matches
Posted
by
Mihail Burduja
on Stack Overflow
See other posts from Stack Overflow
or by Mihail Burduja
Published on 2012-11-11T10:42:57Z
Indexed on
2012/11/11
11:00 UTC
Read the original article
Hit count: 185
I have some text; I want to extract pairs of words that are not separated by punctuation. Thi is the code:
//n-grams
Pattern p = Pattern.compile("[a-z]+");
if (n == 2) {
p = Pattern.compile("[a-z]+ [a-z]+");
}
if (n == 3) {
p = Pattern.compile("[a-z]+ [a-z]+ [a-z]+");
}
Matcher m = p.matcher(text.toLowerCase());
ArrayList<String> result = new ArrayList<String>();
while (m.find()) {
String temporary = m.group();
System.out.println(temporary);
result.add(temporary);
}
The problem is that it skips some matches. For example "My name is James", for n = 3, must match "my name is" and "name is james", but instead it matches just the first. Is there a way to solve this?
© Stack Overflow or respective owner