HTTP responses curl and wget different results

Posted by Fab on Super User See other posts from Super User or by Fab
Published on 2011-06-23T14:32:37Z Indexed on 2011/06/23 16:25 UTC
Read the original article Hit count: 362

Filed under:
|
|

To check HTTP response header for a set of urls I send with curl the following request headers

foreach ( $urls as $url )
{
    // Setup headers - I used the same headers from Firefox version 2.0.0.6
    $header[ ] = "Accept: text/xml,application/xml,application/xhtml+xml,";
    $header[ ] = "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[ ] = "Cache-Control: max-age=0";
    $header[ ] = "Connection: keep-alive";
    $header[ ] = "Keep-Alive: 300";
    $header[ ] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[ ] = "Accept-Language: en-us,en;q=0.5";
    $header[ ] = "Pragma: "; // browsers keep this blank.

    curl_setopt( $ch, CURLOPT_URL, $url );
    curl_setopt( $ch, CURLOPT_USERAGENT, 'Googlebot/2.1 (+http://www.google.com/bot.html)');
    curl_setopt( $ch, CURLOPT_HTTPHEADER, $header);
    curl_setopt( $ch, CURLOPT_REFERER, 'http://www.google.com');
    curl_setopt( $ch, CURLOPT_HEADER, true );
    curl_setopt( $ch, CURLOPT_NOBODY, true );
    curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
    curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
    curl_setopt( $ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY );
    curl_setopt( $ch, CURLOPT_TIMEOUT, 10 ); //timeout 10 seconds
}

Sometimes I receive 200 OK which is good other time 301, 302, 307 which I consider good as well, but other times I receive weird status as 406, 500, 504 which should identify an invalid url but when I open it on the browser they are fine

for example the script returns

http://www.awe.co.uk/ => HTTP/1.1 406 Not Acceptable

and wget returns

wget http://www.awe.co.uk/
--2011-06-23 15:26:26--  http://www.awe.co.uk/
Resolving www.awe.co.uk... 77.73.123.140
Connecting to www.awe.co.uk|77.73.123.140|:80... connected.
HTTP request sent, awaiting response... 200 OK

Does anyone know which request header I am missing or adding in excess?

© Super User or respective owner

Related posts about http

Related posts about wget