HttpWebRequest Timeouts After Ten Consecutive Requests

Posted by Bob Mc on Stack Overflow See other posts from Stack Overflow or by Bob Mc
Published on 2009-07-28T04:21:56Z Indexed on 2010/04/27 2:53 UTC
Read the original article Hit count: 419

Filed under:
|
|

I'm writing a web crawler for a specific site. The application is a VB.Net Windows Forms application that is not using multiple threads - each web request is consecutive. However, after ten successful page retrievals every successive request times out.

I have reviewed the similar questions already posted here on SO, and have implemented the recommended techniques into my GetPage routine, shown below:

Public Function GetPage(ByVal url As String) As String
    Dim result As String = String.Empty

    Dim uri As New Uri(url)
    Dim sp As ServicePoint = ServicePointManager.FindServicePoint(uri)
    sp.ConnectionLimit = 100

    Dim request As HttpWebRequest = WebRequest.Create(uri)
    request.KeepAlive = False
    request.Timeout = 15000

    Try
        Using response As HttpWebResponse = DirectCast(request.GetResponse, HttpWebResponse)
            Using dataStream As Stream = response.GetResponseStream()
                Using reader As New StreamReader(dataStream)
                    If response.StatusCode <> HttpStatusCode.OK Then
                        Throw New Exception("Got response status code: " + response.StatusCode)
                    End If
                    result = reader.ReadToEnd()
                End Using
            End Using
            response.Close()
        End Using

    Catch ex As Exception
        Dim msg As String = "Error reading page """ & url & """. " & ex.Message
        Logger.LogMessage(msg, LogOutputLevel.Diagnostics)
    End Try

    Return result

End Function

Have I missed something? Am I not closing or disposing of an object that should be? It seems strange that it always happens after ten consecutive requests.

Notes:

  1. In the constructor for the class in which this method resides I have the following:

    ServicePointManager.DefaultConnectionLimit = 100

  2. If I set KeepAlive to true, the timeouts begin after five requests.

  3. All the requests are for pages in the same domain.

EDIT

I added a delay between each web request of between two and seven seconds so that I do not appear to be "hammering" the site or attempting a DOS attack. However, the problem still occurs.

© Stack Overflow or respective owner

Related posts about httpwebrequest

Related posts about .NET