PHP: What is an efficient way to parse a text file containing very long lines?

Posted by Shaun on Stack Overflow See other posts from Stack Overflow or by Shaun
Published on 2010-03-31T23:26:06Z Indexed on 2010/03/31 23:33 UTC
Read the original article Hit count: 295

Filed under:
|
|
|

I'm working on a parser in php which is designed to extract MySQL records out of a text file. A particular line might begin with a string corresponding to which table the records (rows) need to be inserted into, followed by the records themselves. The records are delimited by a backslash and the fields (columns) are separated by commas. For the sake of simplicity, let's assume that we have a table representing people in our database, with fields being First Name, Last Name, and Occupation. Thus, one line of the file might be as follows

[People] = "\Han,Solo,Smuggler\Luke,Skywalker,Jedi..."

Where the ellipses (...) could be additional people. One straightforward approach might be to use fgets() to extract a line from the file, and use preg_match() to extract the table name, records, and fields from that line.

However, let's suppose that we have an awful lot of Star Wars characters to track. So many, in fact, that this line ends up being 200,000+ characters/bytes long. In such a case, taking the above approach to extract the database information seems a bit inefficient. You have to first read hundreds of thousands of characters into memory, then read back over those same characters to find regex matches.

Is there a way, similar to the Java String next(String pattern) method of the Scanner class constructed using a file, that allows you to match patterns in-line while scanning through the file?

The idea is that you don't have to scan through the same text twice (to read it from the file into a string, and then to match patterns) or store the text redundantly in memory (in both the file line string and the matched patterns). Would this even yield a significant increase in performance? It's hard to tell exactly what PHP or Java are doing behind the scenes.

© Stack Overflow or respective owner

Related posts about php

Related posts about parsing