How to work around a site forbidding me to scrape their images with PHP
- by Petruza
I'm scraping a site, searching for JPGs to download.
Scraping the site's HTML pages works fine.
But when I try getting the JPGs with CURL, copy(), fopen(), etc., I get a 403 forbiden status.
I know that's because the site owners don't want their images scraped, so I understand a good answer would be just don't do it, because they don't want you to.
Ok, but let's say it's ok and I try to work around this, how could this be achieved?
If I get the same URL with a browser, I can open the image perfectly, it's not that my IP is banned or anything, and I'm testing the scraper one file at a time, so it's not blocking me because I make too many requests too often.
From my understanding, it could be that either the site is checking for some cookies that confirm that I'm using a browser and browsing their site before I download a JPG.
Or that maybe PHP is using some user agent for the requests that the server can detect and filter out.
Anyway, have any idea?