How do I deal with content scrapers? [closed]
- by aem
Possible Duplicate:
How to protect SHTML pages from crawlers/spiders/scrapers?
My Heroku (Bamboo) app has been getting a bunch of hits from a scraper identifying itself as GSLFBot. Googling for that name produces various results of people who've concluded that it doesn't respect robots.txt (eg, http://www.0sw.com/archives/96).
I'm considering updating my app to have a list of banned user-agents, and serving all requests from those user-agents a 400 or similar and adding GSLFBot to that list. Is that an effective technique, and if not what should I do instead?
(As a side note, it seems weird to have an abusive scraper with a distinctive user-agent.)