PHP: What is an efficient way to parse a text file containing very long lines?

Posted by Shaun on Stack Overflow See other posts from Stack Overflow or by Shaun
Published on 2010-03-31T23:26:06Z Indexed on 2010/03/31 23:33 UTC
Read the original article Hit count: 345

Filed under:

I'm working on a parser in php which is designed to extract MySQL records out of a text file. A particular line might begin with a string corresponding to which table the records (rows) need to be inserted into, followed by the records themselves. The records are delimited by a backslash and the fields (columns) are separated by commas. For the sake of simplicity, let's assume that we have a table representing people in our database, with fields being First Name, Last Name, and Occupation. Thus, one line of the file might be as follows

[People] = "\Han,Solo,Smuggler\Luke,Skywalker,Jedi..."

Where the ellipses (...) could be additional people. One straightforward approach might be to use fgets() to extract a line from the file, and use preg_match() to extract the table name, records, and fields from that line.

However, let's suppose that we have an awful lot of Star Wars characters to track. So many, in fact, that this line ends up being 200,000+ characters/bytes long. In such a case, taking the above approach to extract the database information seems a bit inefficient. You have to first read hundreds of thousands of characters into memory, then read back over those same characters to find regex matches.

Is there a way, similar to the Java String next(String pattern) method of the Scanner class constructed using a file, that allows you to match patterns in-line while scanning through the file?

The idea is that you don't have to scan through the same text twice (to read it from the file into a string, and then to match patterns) or store the text redundantly in memory (in both the file line string and the matched patterns). Would this even yield a significant increase in performance? It's hard to tell exactly what PHP or Java are doing behind the scenes.

Developer IT

PHP: What is an efficient way to parse a text file containing very long lines? - Developer IT

PHP: What is an efficient way to parse a text file containing very long lines?

php

parsing

Performance

file-io

Related posts about php

Magento, NGINX, PHP-FPM, APC, MEMCACHED, 16gb Ram CentOS, Spiking PHP-FPM to 100% CPU

PHP Pear Installation on CentOS

Apache configurations for php "AddType text/html php" or "AddType application/x-httpd-php php .php"

mod_rewrite settings causes server to throw HTTP 500 errors instead of 404

Problems installing Memcache (PECL extension)

Related posts about parsing

Hot to fix nautilus desktop on linux mint

Is parsing JSON faster than parsing XML

Looking for a tutorial on Recursive Descent Parsing.

Parsing XML with Hpricot, a Gem of a Ruby Gem

Parsing scripts that use curly braces

Categories cloud