How to protect SHTML pages from crawlers/spiders/scrapers?

Posted by Adam Lynch on Pro Webmasters See other posts from Pro Webmasters or by Adam Lynch
Published on 2011-05-13T19:40:40Z Indexed on 2012/04/10 17:46 UTC
Read the original article Hit count: 366

Filed under:
|

I have A LOT of SHTML pages I want to protect from crawlers, spiders & scrapers.

I understand the limitations of SSIs. An implementation of the following can be suggested in conjunction with any technology/technologies you wish:

The idea is that if you request too many pages too fast you're added to a blacklist for 24 hrs and shown a captcha instead of content, upon every page you request. If you enter the captcha correctly you've removed from the blacklist.
There is a whitelist so GoogleBot, etc. will never get blocked.

Which is the best/easiest way to implement this idea?

Server = IIS

Cleaning out the old tuples from a DB every 24 hrs is easily done so no need to explain that.

© Pro Webmasters or respective owner

Related posts about security

Related posts about scraper-sites