Search Results

Search found 160 results on 7 pages for 'googlebot'.

Page 2/7 | < Previous Page | 1 2 3 4 5 6 7 | Next Page >

Weird URLs being access by Googlebot

- by Avishai

Lately I've been seing all sorts of strange URLs show up as errors in my Webmaster tools account, but they're URLs that don't actually exist on my site, nor are linked from the pages that Google claims they're linked from. URL Response Code Detected yR3kna/5RfA4+ndtn/X4zcevudMlXbqbIrnPbH9irw= 404 9/16/12 OK4iaOVdr6Ocjmz+u1kuR5Q486mhDo/e45nwjl2+y8= 404 9/9/12 pxGz/oHEA0BS8U3VFBzJcZnnIHMsFXb3/rIxMxh2ws= 404 9/16/12 Af8tbvQ0HniIpf53I8Txz1hM1/JxxrFQxgqPuErWII= 404 9/9/12 7Bk7c0LDmm4PHyTjml017EGwNNPCn/p/0xMSWWPDic= 404 9/16/12 umCwnDvTE8ybpUB19MIb+VRj5xRJncyYGGfAQ2Mxn0= 404 9/1/12 # etc... Do you know how to make these stop? It's not at all clear to me why it would be going to these URLs in the first place.

Read the article
Googlebot substitutes the links of Rails app with subdomain.

- by Victor

I have this Rails app, with domain name abc.com. I am also having a separate subdomain for Piwik stats, in this subdomain stats.abc.com. Googlebot somehow listed some of the links with my subdomain too. http://abc.com/login http://stats.abc.com/login http://abc.com/signup http://stats.abc.com/signup The ones with stats will reference to the same page in the app, but are treated entirely different website. I have put in robots.txt in stats after this matter, but wondering if there is any appropriate way to block this because I may have new subdomains in future. Here's my content in robots.txt User-agent: * Disallow: / Thanks.

Read the article
Does GoogleBot respect User-agent: *

- by rkulla

I blocked a page in robots.txt under User-agent: *, and tried to do a manual removal of that URL from Google's cache in the webmasters tools. Google said it wasn't being blocked in my robots.txt, so I then blocked it specifically under User-agent: GoogleBot and tried removing it again and this time it worked. Does that mean Google doesn't respect User-agent: * or what?

Read the article
Will Google crawl session based website

- by DonShwep

I have a website, it is split into 3 categories but using PHP its an all-in one kind of style. When a user chooses a category on the home page a session is set, this is then used to set the style and contents of the website. Would Googlebot and other bots be able to still scan my website? If a page is accessed and no session is set then the user is sent back to the home page. I have created special links, that set a session but go straight to the contact page. Even this page doesn't seem to be showing up. Any ideas if a sitemap with specially crafted links (to set the session) will help Google?

Read the article
404 code/header for search engines, on removed user content?

- by mowgli

I just got an email, from a former user on my website He was complaining that Google still shows the contact page he created on my site, even though he deleted it a month ago This is the first time in many years anyone requests this I told him, that it's almost entirely up to Google what content it wants to keep/show and for how long. If it's deleted on the site, I can't do much, other than request a re-visit from the googlebot The user-page already now says something like "Not found. The user has removed the content" TL;DR: But the question is: Should I generally add a 404 header (or other) for dynamic user content that has been removed from the site? Or could this hurt the site (SEO)?

Read the article
Does the Instant Preview in Google webmaster tools takes Robot.txt in account?

- by rockyraw

Is that the way to go If I want to visually see what the googlebot see? I'm trying to check a folder which I have just blocked in my robots.txt. If I fetch the folder as google bot, It fetches ok, so that doesn't tell me nothing about whether the block is working I know there's a tool to check for blocking, but it is dependent on the input of the robots.txt Therefore I've tried the Instant preview, and I don't get a preview for what the bot sees ("pre-render), so I think that means that it's because the robots.txt blocks it; however - I don't see the bot tried beforehand to access my updated robots.txt, so I'm not sure how does it know that this folder is blocked? (it does preview another new folder, that is not blocked)

Read the article
How do I get Google to crawl my content when it's only displayed when you fill in a form?

- by Sarang Patil

I have a webpage. It has a form and the "results" section is blank. When the user searches for items, and a list that pops up, he/she chooses one option from list and then the corresponding results are displayed in results section. I once decided to log every ip,url of person with time that visits my page. One ip was 66.249.73.26, and on doing google search I came to know it is ip of google bot. link for whatmyipaddress google bot Now when I searched for the links that this ip visited, it was like this: search?id=100 search?id=110 ... search?id=200 ... then afterwards it incremented in steps of 1, like 400,401.. But people search for strings and not numbers. And because googlebot searches for numbers like this, I think the corresponding content is never displayed and so my page content is never indexed, even though it has rich content. So I want to ask you is that in order to show google bot all the content that the webpage has, should I list all the results in index page and ask users to enter string to filter results?

Read the article
Somehow Google considers a properly 301'd URL as 200 and is still indexing the new content in old page?

- by user2178914

We redirected all the old URL's to new ones properly using htaccess. The problem is Google, somehow is still finding content in the old page(which it shouldn't) and stores it in the cache rather than the new URL. For eg: Old Page- http://www.natures-energies.com/iching.htm New Page- http://www.natures-energies.com/index.php?option=com_content&view=article&id=760 If you type the old URL into the browser it redirects If you fetch the old URL as Googlebot in the webmaster tools the header says 301/permanently redirected. If I try to crawl as any other bot it still says 301 redirected. Even if you click the old link in Google it redirects to the new URL. Only in its cache it shows the old URL and moreover it shows the new content in it! I am stumped on how Google manages to grab the new content and puts in the old URL instead of the new one! One more interesting thing is that if I try a cache for the new page it shows the cache of the new content with old URL! Any help would be appreciated. I am at end of my wits. I think i have tried almost everything. Is there anything that I'm missing to see? You can use this search to find the old url's. Maybe you'll some patterns that i missed. site:www.natures-energies.com inurl:htm -inurl:https|index

Read the article
503 server response for Googlebot

- by Hallik

I put an .htaccess file in my webroot with the following contents RewriteBase / RewriteCond %{HTTP_USER_AGENT} ^.*(Googlebot|Googlebot|Mediapartners|Adsbot|Feedfetcher)-?(Google|Image)? [NC] RewriteRule .* /var/www/503.html This website is in maintenance mode, and I don't want anything indexed yet. I tested the code with a firefox User-Agent switcher plugin, and looking at the access log it shows this at the end of each log entry, but watching in TamperData or Firebug, it still returns a 200 server response instead of a 503. What am I doing wrong? "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" contents of /var/www/503.html <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN"> <html> <head> <title>503 - Service temporary unavailable</title> </head> <body> <h1>503 - Service temporary unavailable</h1> <p>Sorry, this website is currently down for maintainance please retry later</p> </body> </html> I get this in my error log. LogLevel debug, would that go into the vhost in a specific place? Every answer I see on google is something different. Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.

Read the article
Google-bot sees “Sorry, we have no imagery here” on pages with Google Maps

- by friism

I have a site with Google Maps on most of the pages. When inspecting content keywords in Google Webmaster tools, content keywords identified by Google-bot for the site include "imagery", "sorry" and "here". These turn out to be part of an error message returned by Google Maps: "Sorry, we have no imagery here". I cannot reproduce this error with normal clients, nor does "fetch as Google" show it. The problem is presumably that Google-bot tries to execute some of the Google Maps Javascript but then shoots itself on the foot and records the error message. A Google search for "Sorry, we have no imagery here" shows that this problem is endemic to sites across the internet, including Yelp and many others. I'd like to convince Google that my site is not about imagery and being sorry, but I'd also like to keep the maps in place. I guess one option would be to transition to static maps, but that's not a great alternative. There's some related discussion on Webmaster World, no resolution.

Read the article
How do I block a user-agent from Apache

- by rubo77

How do I realize a UA string block by regular expression in the config files of my Apache webserver? For example: if I would like to block out all bots from Apache on my debian server, that have the regular expression /\b\w+[Bb]ot\b/ or /Spider/ in their user-agent. Those bots should not be able to see any page on my server and they should not appear neither in the accesslogs nor in the errorlogs. http://global-security.blogspot.de/2009/06/how-to-block-robots-before-they-hit.html supposes to uses mod_security for that, but isn't there a simple directive for http.conf?

Read the article
Do or can robots cause considerable performance issues?

- by Anicho

So the question in the title is exactly what I am trying to find out. My case is: At work we are in a discussion with team members who seem to think bots will cause us problems relating to performance when running on our services website. Out setup: Lets say I have site www.mysite.co.uk this is a shop window to our online services which sit on www.mysiteonline.co.uk. When people search in google for mysite they see mysiteonline.co.uk as well as mysite.co.uk. Cases against stopping bots crawling: We don't store gb's of data publicly available on the web Most friendly bots, if they were to cause issues would have done so already In our instance the bots can't crawl the site because it requires username & password Stopping bots with robot .txt causes an issue with seo (ref.1) If it was a malicious bot, it would ignore robot.txt or meta tags anyway Ref 1. If we were to block mysiteonline.co.uk from having robots crawl this will affect seo rankings and make it inconvenient for users who actively search for mysite to find mysiteonline. Which we can prove is the case for a good portion of our users.

Read the article
What bots are really worth letting onto a site?

- by blunders

Having written a number of bots, and seen the massive amounts of random bots that happen to crawl a site, I am wondering if the goal of the site allowing bots is for the potential for the bot to send real traffic back to the site if there is any reason to allow bots that are not known to be sending real traffic back, and how to spot these "good" bots; based on how they ID themselves, IPs they come from, behaviors, etc.

Read the article
Google bots are severely affecting site performance

- by Lynn

I have an aggregate site on a linux server that pulls in feeds from a universe of about 2,000 blogs. It's in Wordpress 3.4.2 and I have a cron job that is staggered to run five times an hour on another server to pull in the stories and then publish them to the front page of this site. This is so I didn't put too much pressure all on one server. However, the Google bots, which visit a few times every hour bring the server to its knees in the morning and evenings when there is an increase in traffic on the site. The bots have something like 30,000 links to follow at this point. How do I throttle the bots to simply grab the new stories off the front page and stop there? EDIT- Details of my server configuration: The way we have this set up is the server that handles all the publishing is an unmanaged instance via AWS. It mounts the NFS server and connects to the RDS to update content, etc. You get to this publishing instance via a plugin that detects the wp-admin link and then redirects you into there. The front end app server also mounts the NFS and requests data from the RDS. It is the only one that has the WP Super Cache on it.... The OS is Ubuntu on the App server and the NFS runs CentOs. The front end is Nginx and the publishing server is Apache.

Read the article
Using rel=canonical and noindex in a 1-n partners enviroment

- by Telemako Mako

We sell a whole site (domain, etc) to partners that create content that is shown together at the main site. What we want to achieve is that the main site copy is the original, but the one that is indexed is the partners copy. We want to do it this way so the search results point to the partner sites but never to the main site while the main site gets all the credit for the links. We are trying setting the main site article with a noindex, follow and a link to the partner article, and in the partner article we have a rel=canonical pointing to the main site article. Are we correct or the noindex at the main site will break the canonical reference?

Read the article
Bingbot requests from Google IP address

- by JITHIN JOSE

We have some suspicious requests to our server, 74.125.186.46 - - [24/Aug/2014:23:24:11 -0500] "GET <url> HTTP/1.1" 200 16912 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" 74.125.187.193 - - [24/Aug/2014:23:24:12 -0500] "GET <url> HTTP/1.1" 200 20119 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" As it shows, user-agent shows it is bingbot. But whois data of IP address(74.125.186.46 and 74.125.187.193) shows it is from google servers. So is it Google,Bing or any other content scrappers?

Read the article
GWT: reporting crawling errors for non existing links

- by pixeline

Google Webmaster Tools is reporting crawl errors for links that never existed, and if i check the "Linked from" tab for a given error link, it shows another that never existed. They all mention joomla/ which is not the cms used on this domain (it's wordpress fyi). Exampled: http://example.com/joomla/index.php/component/user/register Linked from: http://example.com/joomla/component/user/login?return=L2###### What is going on? UPDATE 1 I tried something: I provided one of the faulty urls to the "Fetch as Google" functionality. Instead of returning a 404, it returns a 301 to another Joomla page. HTTP/1.1 301 Moved Permanently Server: Apache/2.4.3 X-Powered-By: PHP/5.4.4-10 X-Pingback: http://example.com/xmlrpc.php Expires: Wed, 11 Jan 1984 05:00:00 GMT Cache-Control: no-cache, must-revalidate, max-age=0 Pragma: no-cache Set-Cookie: PHPSESSID=1fgr5v2oip39miibuptd51s8h0; path=/ Set-Cookie: woocommerce_items_in_cart=0; expires=Sat, 12-Jan-2013 11:44:01 GMT; path=/ Location: http://example.com/joomla/component/user/register Content-Type: text/html; charset=iso-8859-1 Content-Length: 387 Date: Sat, 12 Jan 2013 12:44:01 GMT Via: 1.1 varnish Connection: keep-alive Accept-Ranges: bytes Age: 0 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>301 Moved Permanently</title> </head><body> <h1>Moved Permanently</h1> <p>The document has moved <a href="http://example.com/joomla/component/user/register">here</a>.</p> <p>Additionally, a 301 Moved Permanently error was encountered while trying to use an ErrorDocument to handle the request.</p> </body></html>

Read the article
My blog not even ranking for exact title match [on hold]

- by Akshay Hallur

I have original in detail blog posts related to blogging and SEO. This domain has been dropped (expired) 2 times before my acquisition. I am the 3rd owner of the domain since 143 days. Blog posts are not ranking even for exact titles. Google+ or LinkedIn shares will show up instead of my content.Some blog posts are not even indexed. I am hardly getting around 7 organic visits / day. Example 1 : http://www.infoflame.com/offer-pdf-of-blog-posts-for-likes-and-shares/ Title: Offer Readers PDF of Blog Posts for Their Likes and Shares not indexed at all. Example 2 : http://www.infoflame.com/anchor-text-for-seo/ is indexed but not coming up for the exact title. Suspect: Dropped domain, less likely used for spam( WayBack machine (2 drops) 3 captures since 2004, I don't know whether there was Email spam) (But no manual actions in WMT, so no reconsideration request). What's the reason for this? Should I wait? How can I tell Google that ownership is changed and the domain is now spam-free? or should I de-index it and start a new blog? Thank you, for any advises.

Read the article
Why old (301) links stay on Google when breaking site down to multiple domains

- by Sampo Sarrala

Some background: We did have single site and single domain (let's call it mainsite.com) with product information, however things have changed since and product database has grown fast. So we decided to move some major products/manufacturers under their own domains (let's call one of them subsite.com) while still using our main database/codebase. What we've done: Added subsite.com domain for product 1 by Great Products Co. Some new nice looking front pages, info pages, etc. Detail pages that will use information from original db. Redirected product/group links from mainsite.com using 301 redirect. Verified that redirects works as expected. Waited some time for Google reindexing (over 30 days, I've heard it should be more than enough). Results: If I search our moved products from Google then it will found them and list them but with old links to our main page like mainsite.com/group/product1 but it should show link to new site subsite.com/product1. Links from Goole redirects as they should, as said redirects are verified [301]. Main question: Any reasons why Google would not follow 301 redirects and update links so that they will point to our new mfg/product site subsite.com?

Read the article
Anyone heard of 'tank tracking'?

- by Heather Walters

Someone I know told me about some seriously cutting edge blackhat SEO that he called 'tank tracking'. He said that it is some sort of code (he believes written in Python) that 'sits' around the outside perimeter of your visible webpage and listens for an incoming search spider. when a spider enters the page, it traps it in this weird wormhole, making it loop through, I don't know, certain keywords or something.... the result is that a SE like google would consequently give the page a full 100 rating (this person told me Google bestows some sort of scoring app once you've passed a certain number of their exams). A quick google search on 'tank tracking seo', 'tank tracking blackhat seo' and 'tank tracking google' yielded zero results. Let me backtrack a bit and say that I am not interested in utilizing blackhat techniques. I'm just astonished that something like this might be out in the world. Anyone heard of this?

Read the article
How to resolve "Google can't find your site's robots.txt" error?

- by Manivasagam

I've recently found that "Google can't find your site's robots.txt" in crawl errors. When I tried Fetching as Google, I got result "SUCCESS", then I tried looking at crawl errors and it still shows "Google can't find your site's robots.txt". What can I do to resolve this issue? Before this issue arose, my site was indexed within a few mintues, but now I find that it took time to be indexed in Google's search. When I access http://mydomain.com/robots.txt, it shows the data below: User-agent: *Disallow: /wp-admin/ Disallow: /wp-includes/ I found Blocked URLs = 0, also no any other errors. Is there any other thing I need to change? Or what could be the solution for this? Any help would be appreciated.

Read the article
Blog not even ranking for exact title match, after domain has been dropped twice [on hold]

- by Akshay Hallur

Consider a blog, related to blogging and SEO. The domain has been dropped (expired) 2 times before acquisition. The current owner is the 3rd owner of the domain since 5 months. Blog posts are not ranking, even for exact titles. Google+ or other shares will show up instead of the content. Some blog posts are not even indexed. Let us TAKE that it gets around 7 organic visits / day. Dropped domain, less likely used for spam (WayBack machine (2 Reframed drops) 3 captures since 2004, Don't know whether there was Email spam) (But no manual actions in WMT, so no reconsideration request). What could be the reason for this? How can Google be told that ownership is changed and the domain is now spam-free? Would this domain be salvageble, or does this only change after relocating to another domain?

Read the article
Server overhead caused by bots?

- by giuseppe

I have one customer website causing overhead (http://www.modacalcio.it/en/by-kind/football-boots.html). With htop opened, I am trying navigate the website and the much load of the website is done by the ajax link being placed on the left side of the website. The website is hosted by a VPS with 3 proc and 2GB RAM, with enough hard with disk space. The real problem is that this website is new and not visited much. From the http-status module I am seeing that the overhead is caused by bots (Google bots, Bing bots, hrefs checker and so on). So I thought that's probably due to those spiders trying to crawl all those links at once - could this be causing this overhead? I have also put rel="nofollow" in those links, but this doesn't keep the bots away. Is there any way through code or Plesk to disable those links to those bots?

Read the article
HTTP 303 redirection and robots.txt

- by Ian Dickinson

On a site I'm working on, we're using the HTTP 303 redirect pattern (see this article for background) to distinguish between information and non-information resources. So: some URL's under /id get redirected to dynamically-created pages under /doc. These dynamic pages are built from a database, and contain links to other /doc/ resources, so in general we don't want them to be crawled. Our robots.txt contains: Disallow: /doc However, we do want the non-redirected pages under /id to get indexed by Google et al: Allow: /id So the question I have, which I can't find an answer to so far, is: if an allowed /id page 303-redirects to a /doc page, will it still be blocked by robots.txt? If yes, we're OK, but otherwise I'm going to disallow all /id resources in the robots file, as having the crawler hammer the db would be worse than losing search indexing for the /id pages.

Read the article
Page load speeds effect on crawl rate

- by Sam Pegler

We've noticed a big drop in the total pages crawled per day on our site, we have no control over the crawl rate in google webmaster tools so it's possible this has been changed by google. However it's a fairly large site and I wouldn't of thought that the crawl rate would've been decreased. What we have noticed though is a sizeable increase in page load times, in my mind this would be the cause. Can anyone else confirm if the crawl rate is directly correlated to page load time? Seems logical, longer page load time, less pages crawled. Any decent documentation on this would be appreciated, I don't normally have any input on SEO so this is new to me.

Read the article

< Previous Page | 1 2 3 4 5 6 7 | Next Page >