HTTP 303 redirection and robots.txt

Posted by Ian Dickinson on Pro Webmasters See other posts from Pro Webmasters or by Ian Dickinson
Published on 2012-05-16T21:28:40Z Indexed on 2012/08/30 21:51 UTC
Read the original article Hit count: 305

Filed under:
|
|

On a site I'm working on, we're using the HTTP 303 redirect pattern (see this article for background) to distinguish between information and non-information resources. So: some URL's under /id get redirected to dynamically-created pages under /doc. These dynamic pages are built from a database, and contain links to other /doc/ resources, so in general we don't want them to be crawled. Our robots.txt contains:

Disallow: /doc

However, we do want the non-redirected pages under /id to get indexed by Google et al:

Allow: /id

So the question I have, which I can't find an answer to so far, is: if an allowed /id page 303-redirects to a /doc page, will it still be blocked by robots.txt?

If yes, we're OK, but otherwise I'm going to disallow all /id resources in the robots file, as having the crawler hammer the db would be worse than losing search indexing for the /id pages.

© Pro Webmasters or respective owner

Related posts about redirects

Related posts about robots.txt