java regex: capture multiline sequence between tokens

Posted by Guillaume on Stack Overflow See other posts from Stack Overflow or by Guillaume
Published on 2010-03-25T11:34:01Z Indexed on 2010/03/25 21:03 UTC
Read the original article Hit count: 446

Filed under:
|

I'm struggling with regex for splitting logs files into log sequence in order to match pattern inside these sequences. log format is:

timestamp fieldA fieldB fieldn log message1 
timestamp fieldA fieldB fieldn log message2
log message2bis
timestamp fieldA fieldB fieldn log message3 

The timestamp regex is known.

I want to extract every log sequence (potentialy multiline) between timestamps. And I want to keep the timestamp.

I want in the same time to keep the exact count of lines.

What I need is how to decorate timestamp pattern to make it split my log file in log sequence. I can not split the whole file as a String, since the file content is provided in a CharBuffer

Here is sample method that will be using this log sequence matcher:

private void matches(File f, CharBuffer cb) {
    Matcher sequenceBreak = sequencePattern.matcher(cb);    // sequence matcher
    int lines = 1;
    int sequences = 0;

    while (sequenceBreak.find()) {
        sequences++;

        String sequence = sequenceBreak.group();
        if (filter.accept(sequence)) {
            System.out.println(f + ":" + lines + ":" + sequence);                
        }

        //count lines
        Matcher lineBreak = LINE_PATTERN.matcher(sequence);
        while (lineBreak.find()) {
            lines++;
        }

        if (sequenceBreak.end() == cb.limit()) {
            break;
        }
    }        
}

© Stack Overflow or respective owner

Related posts about java

Related posts about regex