Best practice -- Content Tracking Remote Data (cURL, file_get_contents, cron, et. al)?

Posted by user322787 on Stack Overflow See other posts from Stack Overflow or by user322787
Published on 2010-04-21T23:45:49Z Indexed on 2010/04/22 5:23 UTC
Read the original article Hit count: 252

Filed under:
|
|
|
|

I am attempting to build a script that will log data that changes every 1 second. The initial thought was "Just run a php file that does a cURL every second from cron" -- but I have a very strong feeling that this isn't the right way to go about it.

Here are my specifications: There are currently 10 sites I need to gather data from and log to a database -- this number will invariably increase over time, so the solution needs to be scalable. Each site has data that it spits out to a URL every second, but only keeps 10 lines on the page, and they can sometimes spit out up to 10 lines each time, so I need to pick up that data every second to ensure I get all the data.

As I will also be writing this data to my own DB, there's going to be I/O every second of every day for a considerably long time.

Barring magic, what is the most efficient way to achieve this?

it might help to know that the data that I am getting every second is very small, under 500bytes.

© Stack Overflow or respective owner

Related posts about curl

Related posts about best-practices