Perl, efficient parsing of csv file

Posted by Mike on Stack Overflow See other posts from Stack Overflow or by Mike
Published on 2010-06-17T19:49:02Z Indexed on 2010/06/17 19:53 UTC
Read the original article Hit count: 297

Filed under:
|
|
|

I'm working on a project that involves parsing a large csv formatted file in Perl and am looking to make things more efficient.

My approach has been to split() the file by lines first, and then split() each line again by commas to get the fields. But this suboptimal since at least two passes on the data are required. (once to split by lines, then once again for each line). This is a very large file, so cutting processing in half would be a significant improvement to the entire application.

My question is, what is the most time efficient means of parsing a large CSV file using only built in tools?

note: Each line has a varying number of tokens, so we can't just ignore lines and split by commas only. Also we can assume fields will contain only alphanumeric ascii data (no special characters or other tricks). Also, i don't want to get into parallel processing, although that might work effectively.

© Stack Overflow or respective owner

Related posts about perl

Related posts about parsing