HTTP 303 redirection and robots.txt
Posted
by
Ian Dickinson
on Pro Webmasters
See other posts from Pro Webmasters
or by Ian Dickinson
Published on 2012-05-16T21:28:40Z
Indexed on
2012/08/30
21:51 UTC
Read the original article
Hit count: 305
On a site I'm working on, we're using the HTTP 303 redirect pattern (see this article for background) to distinguish between information and non-information resources. So: some URL's under /id
get redirected to dynamically-created pages under /doc
. These dynamic pages are built from a database, and contain links to other /doc/
resources, so in general we don't want them to be crawled. Our robots.txt
contains:
Disallow: /doc
However, we do want the non-redirected pages under /id
to get indexed by Google et al:
Allow: /id
So the question I have, which I can't find an answer to so far, is: if an allowed /id
page 303-redirects to a /doc
page, will it still be blocked by robots.txt
?
If yes, we're OK, but otherwise I'm going to disallow all /id
resources in the robots file, as having the crawler hammer the db would be worse than losing search indexing for the /id
pages.
© Pro Webmasters or respective owner