Asp.net Crawler Webresponse Operation Timed out.

Posted by Leon on Stack Overflow See other posts from Stack Overflow or by Leon
Published on 2010-05-18T05:46:26Z Indexed on 2010/05/18 5:50 UTC
Read the original article Hit count: 509

Filed under:

Hi I have built a simple threadpool based web crawler within my web application. Its job is to crawl its own application space and build a Lucene index of every valid web page and their meta content. Here's the problem. When I run the crawler from a debug server instance of Visual Studio Express, and provide the starting instance as the IIS url, it works fine. However, when I do not provide the IIS instance and it takes its own url to start the crawl process(ie. crawling its own domain space), I get hit by operation timed out exception on the Webresponse statement. Could someone please guide me into what I should or should not be doing here? Here is my code for fetching the page. It is executed in the multithreaded environment.

private static string GetWebText(string url)
    {
        string htmlText = "";        

        HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
        request.UserAgent = "My Crawler";

        using (WebResponse response = request.GetResponse())
        {
            using (Stream stream = response.GetResponseStream())
            {
                using (StreamReader reader = new StreamReader(stream))
                {
                    htmlText = reader.ReadToEnd();
                }
            }
        }
        return htmlText;
    }

And the following is my stacktrace:

at System.Net.HttpWebRequest.GetResponse() at CSharpCrawler.Crawler.GetWebText(String url) in c:\myAppDev\myApp\site\App_Code\CrawlerLibs\Crawler.cs:line 366 at CSharpCrawler.Crawler.CrawlPage(String url, List1 threadCityList) in c:\myAppDev\myApp\site\App_Code\CrawlerLibs\Crawler.cs:line 105 at CSharpCrawler.Crawler.CrawlSiteBuildIndex(String hostUrl, String urlToBeginSearchFrom, List1 threadCityList) in c:\myAppDev\myApp\site\App_Code\CrawlerLibs\Crawler.cs:line 89 at crawler_Default.threadedCrawlSiteBuildIndex(Object threadedCrawlerObj) in c:\myAppDev\myApp\site\crawler\Default.aspx.cs:line 108 at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(Object state) at System.Threading.ExecutionContext.runTryCode(Object userData) at System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode code, CleanupCode backoutCode, Object userData) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx) at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem() at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

Thanks and cheers, Leon.

Developer IT

Asp.net Crawler Webresponse Operation Timed out. - Developer IT

Asp.net Crawler Webresponse Operation Timed out.

web-crawler

ASP.NET

c#

httpwebresponse

Related posts about web-crawler

web crawler needed

Building an automatic web crawler

Appengine Apps Vs Google bot web crawler

Extracting data from internet

Web crawler update strategy

Related posts about ASP.NET

Migrating ASP.NET MVC 1.0 applications to ASP.NET MVC 2 RTM

April 14th Links: ASP.NET, ASP.NET MVC, ASP.NET Web API and Visual Studio

Use ASP.NET 4 Browser Definitions with ASP.NET 3.5

ASP.NET webforms + ASP.NET Ajax versus ASP.NET MVC and Ajax framework freedom

ASP.NET MVC 2 Released

Categories cloud