What's the fastest way to scrape a lot of pages in php?

Posted by Yegor on Stack Overflow See other posts from Stack Overflow or by Yegor
Published on 2010-05-20T14:47:56Z Indexed on 2010/05/20 14:50 UTC
Read the original article Hit count: 243

Filed under:
|
|
|

I have a data aggregator that relies on scraping several sites, and indexing their information in a way that is searchable to the user.

I need to be able to scrape a vast number of pages, daily, and I have ran into problems using simple curl requests, that are fairly slow when executed in rapid sequence for a long time (the scraper runs 24/7 basically).

Running a multi curl request in a simple while loop is fairly slow. I speeded it up by doing individual curl requests in a background process, which works faster, but sooner or later the slower requests start piling up, which ends up crashing the server.

Are there more efficient ways of scraping data? perhaps command line curl?

© Stack Overflow or respective owner

Related posts about curl

Related posts about screen-scraping