Python re module becomes 20 times slower when called on greater than 101 different regex
Posted
by
Wiil
on Stack Overflow
See other posts from Stack Overflow
or by Wiil
Published on 2013-06-26T16:12:56Z
Indexed on
2013/06/26
16:21 UTC
Read the original article
Hit count: 220
My problem is about parsing log files and removing variable parts on each lines to be able to group them. For instance:
s = re.sub(r'(?i)User [_0-9A-z]+ is ', r"User .. is ", s)
s = re.sub(r'(?i)Message rejected because : (.*?) \(.+\)', r'Message rejected because : \1 (...)', s)
I have about 120+ matching rules like those above.
I have found no performances issues while searching successively on 100 different regex. But a huge slow down comes when applying 101 regex.
Exact same behavior happens when replacing my rules set by
for a in range(100):
s = re.sub(r'(?i)caught here'+str(a)+':.+', r'( ... )', s)
Got 20 times slower when putting range(101) instead.
# range(100)
% ./dashlog.py file.bz2
== Took 2.1 seconds. ==
# range(101)
% ./dashlog.py file.bz2
== Took 47.6 seconds. ==
Why such thing is happening ? And is there any known workaround ?
(Happens on Python 2.6.6/2.7.2 on Linux/Windows.)
© Stack Overflow or respective owner