Python text file processing speed issues

Posted by Anonymouslemming on Stack Overflow See other posts from Stack Overflow or by Anonymouslemming
Published on 2010-04-22T14:45:23Z Indexed on 2010/04/22 14:53 UTC
Read the original article Hit count: 192

Filed under:
|
|
|

Hi all,

I'm having a problem with processing a largeish file in Python. All I'm doing is

f = gzip.open(pathToLog, 'r')
for line in f:
        counter = counter + 1
        if (counter % 1000000 == 0):
                print counter
f.close

This takes around 10m25s just to open the file, read the lines and increment this counter.

In perl, dealing with the same file and doing quite a bit more (some regular expression stuff), the whole process takes around 1m17s.

Perl Code:

open(LOG, "/bin/zcat $logfile |") or die "Cannot read $logfile: $!\n";
while (<LOG>) {
        if (m/.*\[svc-\w+\].*login result: Successful\.$/) {
                $_ =~ s/some regex here/$1,$2,$3,$4/;
                push @an_array, $_
        }
}
close LOG;

Can anyone advise what I can do to make the Python solution run at a similar speed to the Perl solution? I've tried just uncompressing the file and dealing with it using open instead of gzip.open, but that made a very small difference to the overall time.

© Stack Overflow or respective owner

Related posts about python

Related posts about perl