c# Network Programming - HTTPWebRequest Scraping

Posted by masterguru on Stack Overflow See other posts from Stack Overflow or by masterguru
Published on 2010-05-01T00:10:58Z Indexed on 2010/05/01 0:17 UTC
Read the original article Hit count: 772

Hi,

I am building a web scraping application. It should scrape a complex web site with concurrent HttpWebRequests from a single host to a single target web server.

The application should run on Windows server 2008.

One single HttpWebRequest for data could take from 1 minute to 4 minutes to complete (because of long running db operations)

I should have at least 100 parallel requests to the target web server, but i have noticed that when i use more then 2-3 long-running requests i have big performance issues (request timeouts/hanging).

How many concurrent requests can i have in this scenario from a single host to a single target web server? can i use Thread Pools in the application to run parallel HttpWebRequests to the server? will i have any issues with the default outbound HTTP connection/requests limits? what about Request timeouts when i reach outbound connection limits? what would be the best setup for my scenario?

Any help would be appreciated.

Thanks

© Stack Overflow or respective owner

Related posts about scraping

Related posts about screen-scraping