HTTP 303 redirection and robots.txt

Posted by Ian Dickinson on Pro Webmasters See other posts from Pro Webmasters or by Ian Dickinson
Published on 2012-05-16T21:28:40Z Indexed on 2012/08/30 21:51 UTC
Read the original article Hit count: 374

Filed under:

redirects

|

robots.txt

|

googlebot

On a site I'm working on, we're using the HTTP 303 redirect pattern (see this article for background) to distinguish between information and non-information resources. So: some URL's under /id get redirected to dynamically-created pages under /doc. These dynamic pages are built from a database, and contain links to other /doc/ resources, so in general we don't want them to be crawled. Our robots.txt contains:

Disallow: /doc

However, we do want the non-redirected pages under /id to get indexed by Google et al:

Allow: /id

So the question I have, which I can't find an answer to so far, is: if an allowed /id page 303-redirects to a /doc page, will it still be blocked by robots.txt?

If yes, we're OK, but otherwise I'm going to disallow all /id resources in the robots file, as having the crawler hammer the db would be worse than losing search indexing for the /id pages.

© Pro Webmasters or respective owner

Related posts about redirects

.htaccess: Redirect Hotlink Flash --> Site with embed Flash

as seen on Pro Webmasters - Search for 'Pro Webmasters'
I have some .php sites that embeds .swf files. These .swf files are now linked to by some other guys. And I don't want them to simply open the SWF, I want them to force being redirect to the page where the flash is embed. Data: Site: www.example.com/1 (www.example.com/2, www.example.com/3 and so… >>> More
how to fix: www.domain.com redirected to domain.com

as seen on Pro Webmasters - Search for 'Pro Webmasters'
Hi this website livingalignment.com is very slow to load. The domain and hosting is all with go daddy. In pingdom I found that it is redirecting from www.livingalignment.com to livingalignment.com and it takes about 2 seconds to do so. you can see that here taking about 10 seconds when I entered… >>> More
Seeking htaccess help: Converting multiple subdomains (both http and https) to www.domain.com using .htaccess

as seen on Pro Webmasters - Search for 'Pro Webmasters'
I've been trying to get an answer to this question on other forums (the folks at SuperUser thought this was the place I needed to post) and via my connections, but I haven't gotten very far. Hopefully you guys can help me find an answer: I've got a dozen old subdomains that have been indexed by Google… >>> More
301 rewrite loop with a lowercase URL rule and a URL slug rule [on hold]

as seen on Pro Webmasters - Search for 'Pro Webmasters'
I need to do a 301 rewrite to force all urls to become lowercase. I put in .htaccess (RewriteMap lc int:tolower in httpd.conf): RewriteCond %{REQUEST_URI} [A-Z] RewriteRule . ${lc:{REQUEST_URI}} [R=301,L] Everything works just fine except to urls with subcategories which in this case are: /category-1256-Product-page-example… >>> More
Nice wordpress to wordpress redirect?

as seen on Pro Webmasters - Search for 'Pro Webmasters'
Hi, I have a wordpress blog in http://suportrecerca.barcelonamedia.org/blog/ , and since I can no longer use our company servers for my blog I've had to move it to blog.joanmarcriera.es Google had my old blog well indexed and many people is landing to my old blog. I want to redirect this people… >>> More

Related posts about robots.txt

Robots.txt practices with .htaccess redirections (inherits)

as seen on Pro Webmasters - Search for 'Pro Webmasters'
I have a question regarding how to write robots.txt files for many domains and subdomains with redirects in place. We have a hosting account that enacts primary and add-on domains. All of our domains and subdomains, including the primary domain, is redirected via htaccess 301s to their own subdirectories… >>> More
mod evasive not working properly on ubuntu 10.04

as seen on Server Fault - Search for 'Server Fault'
I have an ubuntu 10.04 server where I installed mod_evasive using apt-get install libapache2-mod-evasive I already tried several configurations, the result stays the same. The blocking does work, but randomly. I tried with low limis and long blocking periods as well as short limits. The behaviour… >>> More
Cross-domain jQuery using YQL gives robots.txt error

as seen on Stack Overflow - Search for 'Stack Overflow'
On the page http://qxlapps.dk/test.htm I am trying to perform an Ajax load from another domain, qxlapp.dk. I am using James Padolsey's xdomainajax.js plugin from: http://james.padolsey.com/javascript/cross-domain-requests-with-jquery/ When I open my test page, I get no output, but FireBug shows… >>> More
Asterisk in robots.txt

as seen on Stack Overflow - Search for 'Stack Overflow'
Wondering if following will work for google in robots.txt Disallow: /*.action I need to exclude all urls ending with .action. Is this correct? >>> More
SEO chaos from changing robots.txt file in Wordpress site

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi there, I recently edited the robots.txt file in my site using a wordpress plugin. However, since i did this, google seems to have removed my site from their search page. I'd appreciate if I could get an expert opinion on why this is so, and a possible solution. I'd initially done it to increase… >>> More