UNIX-style RegExp Replace running extremely slowly under windows. Help? EDIT: Negative lookahead ass

Posted by John Sullivan on Stack Overflow See other posts from Stack Overflow or by John Sullivan
Published on 2010-03-30T19:06:40Z Indexed on 2010/03/30 19:53 UTC
Read the original article Hit count: 325

Filed under:
|

I'm trying to run a unix regEXP on every log file in a 1.12 GB directory, then replace the matched pattern with ''. Test run on a 4 meg file is took about 10 minutes, but worked. Obviously something is murdering performance by several orders of magnitude.

Find: ^(?!.*155[0-2][0-9]{4}\s.*).*$ -- NOTE: match any line NOT starting 155[0-2]NNNN where in is a number 0-9. Replace with: ''.

Is there some justifiable reason for my regExp to take this long to replace matching text, or is the program I am using (this is windows / a program called "grepWin") most likely poorly optimized?

Thanks.

UPDATE: I am noticing that searching for ^(155[0-2]).$ takes ~7 seconds in a 5.6 MB file with 77 matches. Adding the Negative Lookahead Assertion, ?=, so that the regExp becomes ^(?!155[0-2]).$ is causing it to take at least 5-10 minutes; granted, there will be thousands and thousands of matches.

Should the negative lookahead assertion be extremely detrimental to performance, and/or a large quantity of matches?

© Stack Overflow or respective owner

Related posts about regex

Related posts about regex-negation