Getting dialogue snippets from text using regular expressions

Posted by sheldon on Stack Overflow See other posts from Stack Overflow or by sheldon
Published on 2010-06-01T05:46:42Z Indexed on 2010/06/01 5:53 UTC
Read the original article Hit count: 199

Filed under:
|

I'm trying to extract snippets of dialogue from a book text. For example, if I have the string

"What's the matter with the flag?" inquired Captain MacWhirr. "Seems all right to me."

Then I want to extract "What's the matter with the flag?" and "Seem's all right to me.".

I found a regular expression to use here, which is "[^"\\]*(\\.[^"\\]*)*". This works great in Eclipse when I'm doing a Ctrl+F find regex on my book .txt file, but when I run the following code:

String regex = "\"[^\"\\\\]*(\\\\.[^\"\\\\]*)*\"";
String bookText = "\"What's the matter with the flag?\" inquired Captain MacWhirr. \"Seems all right to me.\""; Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(bookText);

if(m.find())
 System.out.println(m.group(1));

The only thing that prints is null. So am I not converting the regex into a Java string properly? Do I need to take into account the fact that Java Strings have a \" for the double quotes?

© Stack Overflow or respective owner

Related posts about java

Related posts about regex