How to protect SHTML pages from crawlers/spiders/scrapers?
- by Adam Lynch
I have A LOT of SHTML pages I want to protect from crawlers, spiders & scrapers.
I understand the limitations of SSIs. An implementation of the following can be suggested in conjunction with any technology/technologies you wish:
The idea is that if you request too many pages too fast you're added to a blacklist for 24 hrs and shown a captcha instead of content, upon every page you request. If you enter the captcha correctly you've removed from the blacklist.
There is a whitelist so GoogleBot, etc. will never get blocked.
Which is the best/easiest way to implement this idea?
Server = IIS
Cleaning out the old tuples from a DB every 24 hrs is easily done so no need to explain that.