How to work around a site forbidding me to scrape their images with PHP

Posted by Petruza on Stack Overflow See other posts from Stack Overflow or by Petruza
Published on 2012-03-29T06:52:04Z Indexed on 2012/03/29 17:29 UTC
Read the original article Hit count: 186

Filed under:
|
|

I'm scraping a site, searching for JPGs to download.
Scraping the site's HTML pages works fine.
But when I try getting the JPGs with CURL, copy(), fopen(), etc., I get a 403 forbiden status.

I know that's because the site owners don't want their images scraped, so I understand a good answer would be just don't do it, because they don't want you to.

Ok, but let's say it's ok and I try to work around this, how could this be achieved?

If I get the same URL with a browser, I can open the image perfectly, it's not that my IP is banned or anything, and I'm testing the scraper one file at a time, so it's not blocking me because I make too many requests too often.

From my understanding, it could be that either the site is checking for some cookies that confirm that I'm using a browser and browsing their site before I download a JPG.
Or that maybe PHP is using some user agent for the requests that the server can detect and filter out.

Anyway, have any idea?

© Stack Overflow or respective owner

Related posts about php

Related posts about screen-scraping