How to protect/monitor your site from crawling by malicious user
Posted
by deathy
on Stack Overflow
See other posts from Stack Overflow
or by deathy
Published on 2008-12-21T22:25:16Z
Indexed on
2010/05/17
16:20 UTC
Read the original article
Hit count: 406
Situation:
- Site with content protected by username/password (not all controlled since they can be trial/test users)
- a normal search engine can't get at it because of username/password restrictions
- a malicious user can still login and pass the session cookie to a "wget -r" or something else.
The question would be what is the best solution to monitor such activity and respond to it (considering the site policy is no-crawling/scraping allowed)
I can think of some options:
- Set up some traffic monitoring solution to limit the number of requests for a given user/IP.
- Related to the first point: Automatically block some user-agents
- (Evil :)) Set up a hidden link that when accessed logs out the user and disables his account. (Presumably this would not be accessed by a normal user since he wouldn't see it to click it, but a bot will crawl all links.)
For point 1. do you know of a good already-implemented solution? Any experiences with it? One problem would be that some false positives might show up for very active but human users.
For point 3: do you think this is really evil? Or do you see any possible problems with it?
Also accepting other suggestions.
© Stack Overflow or respective owner