What are the best measures to protect content from being crawled?
Posted
by
Moak
on Stack Overflow
See other posts from Stack Overflow
or by Moak
Published on 2011-02-08T07:13:34Z
Indexed on
2011/02/08
7:25 UTC
Read the original article
Hit count: 453
I've been crawling a lot of websites for content recently and am surprised how no site so far was able to put up much resistance. Ideally the site I'm working on should not be able to be harvested so easily. So I was wondering what are the best methods to stop bots from harvesting your web content. Obvious solutions:
- Robots.txt (yea right)
- IP blacklists
What can be done to catch bot activity? What can be done to make data extraction difficult? What can be done to give them crap data?
Just looking for ideas, no right/wrong answer
© Stack Overflow or respective owner