De-index URL paremeters

Posted by Doug Firr on Pro Webmasters See other posts from Pro Webmasters or by Doug Firr
Published on 2013-10-20T14:59:06Z Indexed on 2013/10/20 16:11 UTC
Read the original article Hit count: 291

Upon reading over this question is lengthy so allow me to provide a one sentence summary: I need to get Google to de-index URLs that have certain parameters appended

I have a website example.com with language translations.

There used to be many translations but I deleted them all so that only English (Default) and French options remain.

When one selects a language option a parameter is aded to the URL. For example, the home page:

https://example.com (default) https://example.com/main?l=fr_FR (French)

I added a robots.txt to stop Google from crawling any of the language translations:

# robots.txt generated at http://www.mcanerin.com
User-agent: *
Disallow: 
Disallow: /cgi-bin/
Disallow: /*?l=

So any pages containing "?l=" should not be crawled. I checked in GWT using the robots testing tool. It works.

But under html improvements the previously crawled language translation URLs remain indexed. The internet says to add a 404 to the header of the removed URLs so the Googles knows to de-index it.

I checked to see what my CMS would throw up if I visited one of the URLs that should no longer exist.

This URL was listed in GWT under duplicate title tags (One of the reasons I want to scrub up my URLS)

https://example.com/reports/view/884?l=vi_VN&l=hy_AM

This URL should not exist - I removed the language translations. The page loads when it should not! I played around. I typed example.com?whatever123

It seems that parameters always load as long as everything before the question mark is a real URL.

So if Google has indexed all these URLS with parameters how do I remove them? I cannot check if a 404 is being generated because the page always loads because it's a parameter that needs to be de-indexed.

© Pro Webmasters or respective owner

Related posts about google-webmaster-tools

Related posts about indexing