What's the fastest way to scrape a lot of pages in php?
        Posted  
        
            by Yegor
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by Yegor
        
        
        
        Published on 2010-05-20T14:47:56Z
        Indexed on 
            2010/05/20
            14:50 UTC
        
        
        Read the original article
        Hit count: 292
        
I have a data aggregator that relies on scraping several sites, and indexing their information in a way that is searchable to the user.
I need to be able to scrape a vast number of pages, daily, and I have ran into problems using simple curl requests, that are fairly slow when executed in rapid sequence for a long time (the scraper runs 24/7 basically).
Running a multi curl request in a simple while loop is fairly slow. I speeded it up by doing individual curl requests in a background process, which works faster, but sooner or later the slower requests start piling up, which ends up crashing the server.
Are there more efficient ways of scraping data? perhaps command line curl?
© Stack Overflow or respective owner