What do I have to change in my PHP/CURL code to retrieve data from a https:// URL?
- by Edward Tanguay
I have a PHP file using CURL that accepts a Google Doc URL as a parameter, then returns the plain text of the Google Doc.
It worked well until recently when apparently a redirect was added so that the http:// address redirects to the equivalent https:// address, as in this example:
http://docs.google.com/View?id=dc7gj86r_20dn2csqg3
So I changed my code to access the https:// address, but it just returns blank.
What do I have to change my CURL code so that I can get the HTML text from the https:// address?
$url = filter_input(INPUT_GET, 'url',FILTER_SANITIZE_STRING);
$validUrlPrefixes[] = "https://docs.google.com";
if(beginsWithOneOfThese($url, $validUrlPrefixes)) {
$user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, "/tmp/cookie");
curl_setopt($ch, CURLOPT_COOKIEFILE, "/tmp/cookie");
curl_setopt($ch, CURLOPT_URL, $url );
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_VERBOSE, 0);
$rawData = curl_exec($ch);
$rawData = cleanText($rawData);
if(beginsWith($url, "https://docs.google.com")) {
echo qstr::convertGoogleDocContentToText($rawData);
die;
}
echo $rawData;
die;