Link checker ; how to avoid false positives
- by Burnzy
I'm working a on a link checker/broken link finder and I am getting many false positives, after double checking I noticed that many error codes were returning webexceptions but they were actually downloadable, but in some other cases the statuscode is 404 and i can access the page from the browse.
So here is the code, its pretty ugly, and id like to have something more, id say practical. All the status codes are in that big if are used to filter the ones i dont want to add to brokenlink because they are valid links ( i tested them all ). What i need to fix is the structure (if possible) and how to not get false 404.
Thank you!
try
{
HttpWebRequest request = ( HttpWebRequest ) WebRequest.Create ( uri );
request.Method = "Head";
request.MaximumResponseHeadersLength = 32; // FOR IE SLOW SPEED
request.AllowAutoRedirect = true;
using ( HttpWebResponse response = ( HttpWebResponse ) request.GetResponse() )
{
request.Abort();
}
/* WebClient wc = new WebClient();
wc.DownloadString( uri ); */
_validlinks.Add ( strUri );
}
catch ( WebException wex )
{
if ( !wex.Message.Contains ( "The remote name could not be resolved:" ) &&
wex.Status != WebExceptionStatus.ServerProtocolViolation )
{
if ( wex.Status != WebExceptionStatus.Timeout )
{
HttpStatusCode code = ( ( HttpWebResponse ) wex.Response ).StatusCode;
if (
code != HttpStatusCode.OK &&
code != HttpStatusCode.BadRequest &&
code != HttpStatusCode.Accepted &&
code != HttpStatusCode.InternalServerError &&
code != HttpStatusCode.Forbidden &&
code != HttpStatusCode.Redirect &&
code != HttpStatusCode.Found
)
{
_brokenlinks.Add ( new Href ( new Uri ( strUri , UriKind.RelativeOrAbsolute ) , UrlType.External ) );
}
else _validlinks.Add ( strUri );
}
else _brokenlinks.Add ( new Href ( new Uri ( strUri , UriKind.RelativeOrAbsolute ) , UrlType.External ) );
}
else _validlinks.Add ( strUri );
}