Best practice -- Content Tracking Remote Data (cURL, file_get_contents, cron, et. al)?
Posted
by user322787
on Stack Overflow
See other posts from Stack Overflow
or by user322787
Published on 2010-04-21T23:45:49Z
Indexed on
2010/04/22
5:23 UTC
Read the original article
Hit count: 254
I am attempting to build a script that will log data that changes every 1 second. The initial thought was "Just run a php file that does a cURL every second from cron" -- but I have a very strong feeling that this isn't the right way to go about it.
Here are my specifications: There are currently 10 sites I need to gather data from and log to a database -- this number will invariably increase over time, so the solution needs to be scalable. Each site has data that it spits out to a URL every second, but only keeps 10 lines on the page, and they can sometimes spit out up to 10 lines each time, so I need to pick up that data every second to ensure I get all the data.
As I will also be writing this data to my own DB, there's going to be I/O every second of every day for a considerably long time.
Barring magic, what is the most efficient way to achieve this?
it might help to know that the data that I am getting every second is very small, under 500bytes.
© Stack Overflow or respective owner