Guidelines for good webcrawler 'Etiquette'

Posted by Harry on Stack Overflow See other posts from Stack Overflow or by Harry
Published on 2009-06-09T13:33:12Z Indexed on 2010/05/10 8:44 UTC
Read the original article Hit count: 298

Filed under:
|

I'm building a search engine (for fun) and it has just struck me that potentially my little project might wreak havok by clicking on ads and all sorts of problems.

So what are the guidelines for good webcrawler 'Etiquette'?

Things that spring to mind:

  1. Observe Robot.txt instructions
  2. Limit the number of simultaneous requests to the same domain
  3. Don't follow ad links?

Stopping the crawler from clicking on ads - This one is particularly on my mind at the moment... how do i stop my bot from 'clicking' on ads? if it is going straight to the url in the ad is it counted as a click?

© Stack Overflow or respective owner

Related posts about webcrawling

Related posts about best-practices