On a site I'm working on, we're using the HTTP 303 redirect pattern (see this article for background) to distinguish between information and non-information resources. So: some URL's under /id get redirected to dynamically-created pages under /doc. These dynamic pages are built from a database, and contain links to other /doc/ resources, so in general we don't want them to be crawled. Our robots.txt contains:
Disallow: /doc
However, we do want the non-redirected pages under /id to get indexed by Google et al:
Allow: /id
So the question I have, which I can't find an answer to so far, is: if an allowed /id page 303-redirects to a /doc page, will it still be blocked by robots.txt?
If yes, we're OK, but otherwise I'm going to disallow all /id resources in the robots file, as having the crawler hammer the db would be worse than losing search indexing for the /id pages.