java regex: capture multiline sequence between tokens
- by Guillaume
I'm struggling with regex for splitting logs files into log sequence in order to match pattern inside these sequences.
log format is:
timestamp fieldA fieldB fieldn log message1
timestamp fieldA fieldB fieldn log message2
log message2bis
timestamp fieldA fieldB fieldn log message3
The timestamp regex is known.
I want to extract every log sequence (potentialy multiline) between timestamps. And I want to keep the timestamp.
I want in the same time to keep the exact count of lines.
What I need is how to decorate timestamp pattern to make it split my log file in log sequence. I can not split the whole file as a String, since the file content is provided in a CharBuffer
Here is sample method that will be using this log sequence matcher:
private void matches(File f, CharBuffer cb) {
Matcher sequenceBreak = sequencePattern.matcher(cb); // sequence matcher
int lines = 1;
int sequences = 0;
while (sequenceBreak.find()) {
sequences++;
String sequence = sequenceBreak.group();
if (filter.accept(sequence)) {
System.out.println(f + ":" + lines + ":" + sequence);
}
//count lines
Matcher lineBreak = LINE_PATTERN.matcher(sequence);
while (lineBreak.find()) {
lines++;
}
if (sequenceBreak.end() == cb.limit()) {
break;
}
}
}