Help parsing long (3.5mil lines) text file, line by line and storing data, need a strategy
- by Jarrod
This is a question about solving a particular problem I am struggling with, I am parsing a long list of text data, line by line for a business app in PHP (cron script on the CLI). The file follows the format:
HD: Some text here {text here too}
DC: A description here
DC: the description continues here
DC: and it ends here.
DT: 2012-08-01
HD: Next header here {supplemental text}
... this repeats over and over for a few hundred megs
I have to read each line, parse out the HD: line and grab the text on this line. I then compare this text against data stored in a database. When a match is found, I want to then record the following DC: lines that succeed the matched HD:.
Pseudo code:
while ( the_file_pointer_isnt_end_of_file) {
line = getCurrentLineFromFile
title = parseTitleFrom(line)
matched = searchForMatchInDB(line)
if ( matched ) {
recordTheDCLines // <- Best way to do this?
}
}
My problem is that because I am reading line by line, what is the best way to trigger the script to start saving DC lines, and then when they are finished save them to the database?
I have a vague idea, but have yet to properly implement it. I would love to hear the communities ideas\suggestions!
Thank you.