What's the fastest way to scrape a lot of pages in php?
Posted
by Yegor
on Stack Overflow
See other posts from Stack Overflow
or by Yegor
Published on 2010-05-20T14:47:56Z
Indexed on
2010/05/20
14:50 UTC
Read the original article
Hit count: 243
I have a data aggregator that relies on scraping several sites, and indexing their information in a way that is searchable to the user.
I need to be able to scrape a vast number of pages, daily, and I have ran into problems using simple curl requests, that are fairly slow when executed in rapid sequence for a long time (the scraper runs 24/7 basically).
Running a multi curl request in a simple while loop is fairly slow. I speeded it up by doing individual curl requests in a background process, which works faster, but sooner or later the slower requests start piling up, which ends up crashing the server.
Are there more efficient ways of scraping data? perhaps command line curl?
© Stack Overflow or respective owner