Python text file processing speed issues
Posted
by Anonymouslemming
on Stack Overflow
See other posts from Stack Overflow
or by Anonymouslemming
Published on 2010-04-22T14:45:23Z
Indexed on
2010/04/22
14:53 UTC
Read the original article
Hit count: 192
Hi all,
I'm having a problem with processing a largeish file in Python. All I'm doing is
f = gzip.open(pathToLog, 'r')
for line in f:
counter = counter + 1
if (counter % 1000000 == 0):
print counter
f.close
This takes around 10m25s just to open the file, read the lines and increment this counter.
In perl, dealing with the same file and doing quite a bit more (some regular expression stuff), the whole process takes around 1m17s.
Perl Code:
open(LOG, "/bin/zcat $logfile |") or die "Cannot read $logfile: $!\n";
while (<LOG>) {
if (m/.*\[svc-\w+\].*login result: Successful\.$/) {
$_ =~ s/some regex here/$1,$2,$3,$4/;
push @an_array, $_
}
}
close LOG;
Can anyone advise what I can do to make the Python solution run at a similar speed to the Perl solution? I've tried just uncompressing the file and dealing with it using open instead of gzip.open, but that made a very small difference to the overall time.
© Stack Overflow or respective owner