I'm working on optimizing my site for Google's search engine, and lately I've noticed that when doing a "site:www.joemajewski.com" query, I get results for pages that shouldn't be indexed at all.
Let's take a look at this page, for example: http://www.joemajewski.com/wow/profile.php?id=3
I created my own CMS, and this is simply a breakdown of user id #3's statistics, which I noticed is indexed by Google, although it shouldn't be. I understand that it takes some time before Google's results reflect accurately on my site's content, but this has been improperly indexed for nearly six months now.
Here are the precautions that I have taken:
My robots.txt file has a line like this:
Disallow: /wow/profile.php*
When running the url through Google Webmaster Tools, it indicates that I did, indeed, correctly create the disallow command. It did state, however, that a page that doesn't get crawled may still get displayed in the search results if it's being linked to. Thus, I took one more precaution.
In the source code I included the following meta data:
<meta name="robots" content="noindex,follow" />
I am assuming that follow means to use the page when calculating PageRank, etc, and the noindex tells Google to not display the page in the search results.
This page, profile.php, is used to take the $_GET['id'] and find the corresponding registered user. It displays a bit of information about that user, but is in no way relevant enough to warrant a display in the search results, so that is why I am trying to stop Google from indexing it.
This is not the only page Google is indexing that I would like removed. I also have a WordPress blog, and there are many category pages, tag pages, and archive pages that I would like removed, and am doing the same procedures to attempt to remove them.
Can someone explain how to get pages removed from Google's search results, and possibly some criteria that should help determine what types of pages that I don't want indexed. In terms of my WordPress blog, the only pages that I truly want indexed are my articles. Everything else I have tried to block, with little luck from Google.
Can someone also explain why it's bad to have pages indexed that don't provide any new or relevant content, such as pages for WordPress tags or categories, which are clearly never going to receive traffic from Google.
Thanks!