How to protect SHTML pages from crawlers/spiders/scrapers?
Posted
by
Adam Lynch
on Pro Webmasters
See other posts from Pro Webmasters
or by Adam Lynch
Published on 2011-05-13T19:40:40Z
Indexed on
2012/04/10
17:46 UTC
Read the original article
Hit count: 366
security
|scraper-sites
I have A LOT of SHTML pages I want to protect from crawlers, spiders & scrapers.
I understand the limitations of SSIs. An implementation of the following can be suggested in conjunction with any technology/technologies you wish:
The idea is that if you request too many pages too fast you're added to a blacklist for 24 hrs and shown a captcha instead of content, upon every page you request. If you enter the captcha correctly you've removed from the blacklist.
There is a whitelist so GoogleBot, etc. will never get blocked.
Which is the best/easiest way to implement this idea?
Server = IIS
Cleaning out the old tuples from a DB every 24 hrs is easily done so no need to explain that.
© Pro Webmasters or respective owner