.NET RegEx - First N chars of First M lines

Posted by George on Stack Overflow See other posts from Stack Overflow or by George
Published on 2010-12-29T09:01:38Z Indexed on 2010/12/29 14:53 UTC
Read the original article Hit count: 357

Filed under:
|
|
|
|

Hello!

I want 4 general RegEx expressions for the following 4 basic cases:

  1. Up to A chars starting after B chars from start of line on up to C lines starting after D lines from start of file
  2. Up to A chars starting after B chars from start of line on up to C lines occurring before D lines from end of file
  3. Up to A chars starting before B chars from end of line on up to C lines starting after D lines from start of file
  4. Up to A chars starting before B chars from end of line on up to C lines starting before D lines from end of file

These would allow to select arbitrary text blocks anywhere in the file.

So far I have managed to come up with cases that only work for lines and chars separately:

  • (?<=(?m:^[^\r]{N}))[^\r]{1,M} = UP TO M chars OF EVERY LINE, AFTER FIRST N chars
  • [^\r]{1,M}(?=(?m:.{N}\r$)) = UP TO M chars OF EVERY LINE, BEFORE LAST N chars

The above 2 expressions are for chars, and they return MANY matches (one for each line).

  • (?<=(\A([^\r]*\r\n){N}))(?m:\n*[^\r]*\r$){1,M} = UP TO M lines AFTER FIRST N lines
  • (((?=\r?)\n[^\r]*\r)|((?=\r?)\n[^\r]+\r?)){1,M}(?=((\n[^\r]*\r)|(\n[^\r]+\r?)){N}\Z) = UP TO M lines BEFORE LAST N lines from end

These 2 expressions are equivalents for the lines, but they always return just ONE match.

The task is to combine these expressions to allow for scenarios 1-4. Anyone can help?

Note that the case in the title of the question, is just a subclass of scenario #1, where both B = 0 and D = 0.

EXAMPLE: SOURCE:

line1 blah 1
line2 blah 2
line3 blah 3
line4 blah 4
line5 blah 5
line6 blah 6

DESIRED RESULT: Characters 3-6 of lines 3-5: A total of 3 matches:

<match>ne3 </match>
<match>ne4 </match>
<match>ne5 </match>

© Stack Overflow or respective owner

Related posts about .NET

Related posts about regex