Search Results

Search found 12829 results on 514 pages for 'crawl errors'.

Page 101/514 | < Previous Page | 97 98 99 100 101 102 103 104 105 106 107 108 | Next Page >

Problem with homepage's SEO when using subfolders in a multi language website

- by Antonio

After watching a hundreds of threads about multilanguage website I haven't found an answer to my specific problem, so I think its not a common issue and I must have done something terribly wrong ;-) We have a brand.com website in DE main language and the following subfolders: /de/ = canonical of / + redirect to / /it/ /en/ When I crawl google.com for EN keywords or google.it for IT keywords then I get as results the homepage in German language (both title and description) as the top result with no trace of the /it/ or the /en/ homepage. Is this because /it/ and /en/ both needs a separate link building strategy? I've already configured Google webmaster tool into the following way: brand.com, no language preference brand.com/de/, de language brand.com/it/, it language brand.com/en/, en language Perhaps having "/" as DE main page is it wrong and I should use a different approach? i.e. like having "/" to be a 301 to /de/ instead ? Thanks in advance.

Read the article
Does a "nofollow" attribute on a link prevent URL discovery by search engines?

- by Stephen Ostermiller

I know that nofollow prevents link juice from being passed across a link. But if search engine robots discover a link with a nofollow on it, will they add that link to their crawl queue? In other words, if I create a link to a brand new page and put a rel=nofollow attribute on that link, will it prevent search engine bots (particularly Googlebot) from crawling the page. (Assuming that this link remains the only link into that page.) I've read conflicting reports about this over the years and I'm looking for authoritative references about the current state of affairs. Official statements from Google or published results of independent testing would be ideal.

Read the article
Creating Google sitemap.xml , is it okay for the images to be wrapped in url tags?

- by AzizAG

I'm using a tool to generate the sitemap.xml file for me, it started to crawl my website, got the pages and all images, but when exporting it, I review the xml(to make sure nothing is wrong) and I noticed that the images in my website are wrapped in url tags(I think it should be in image tags). See this: <url><loc>http://mywebsite.com/images/12.jpg</loc><lastmod>2012-05-23T13:39:02+00:00</lastmod><changefreq>weekly</changefreq><priority>0.50</priority></url> Shouldn't it be wrapped in image tag?(just like videos wrapped in video tag) Thanks.

Read the article
webmaster tools - Network Unreachable

- by Jayapal Chandran

Hi, webmaster tools for my site displays that robots.txt unreachable and for all links in sitemap it says network unreachable. sitemap.xml unreachable. These appear in crawl stats page. I discussed with the support team of my hosting and they said... Hi, I have verified apache logs, i cannot see any issues on your website/webserver/ Possible issues. There may the routing issue from the googles server to our server. When a google bots hits goes high the IP will be automatically blacklisted by our firewall to avoid server loads & downtimes. As we donot have access to their services, We cannot able to give details of their details/logs etc. The sitemaps link shows an exclamation mark which means the file was not reachable. What could be the problem and how to solve it?

Read the article
Google is not indexing my entire site despite having a sitemap

- by Anusha

I have an e-commerce website www.beyondtime.in. I have been constantly monitoring Googlebot crawling on my website and my webmaster account. Lately, I have found two issues that I have not been able to understand. 1.) The Google Bots have been only crawling www.beyondtime.in/telecom.php when the URL is not even valid. What needs to be done to let Google crawl other pages of the website as well? 2.) The second question is about the Google Webmaster account, where I've submitted my sitemap with 227 URLs. Out of that, only 156 have been indexed. None of the images of my website have been indexed by Google.

Read the article
Sample size and statistical significance in Google Analytics

- by colmcq

I have been asked to compile a report into dropout rates during checkout for a global webstore I have used a sample size over one month as my sample because: google analytics slows to a crawl over larger sample sizes and makes much of the analysis agonisingly small I believe it to be statistically significant and a representative sample My client has asked me why I didn't use yearly figures and wants proof that one month of data is 'statistically significant'. Am I right in thinking that I need to compare the standard deviation of my monthly sample to the yearly sample and ensure that the deviation is under a certain %age? Question: how do I prove one month of Google Analytics data is representative to one year worth of data? Stats: 90k unique views/month ~1.1m per year.

Read the article
Server overhead caused by bots?

- by giuseppe

I have one customer website causing overhead (http://www.modacalcio.it/en/by-kind/football-boots.html). With htop opened, I am trying navigate the website and the much load of the website is done by the ajax link being placed on the left side of the website. The website is hosted by a VPS with 3 proc and 2GB RAM, with enough hard with disk space. The real problem is that this website is new and not visited much. From the http-status module I am seeing that the overhead is caused by bots (Google bots, Bing bots, hrefs checker and so on). So I thought that's probably due to those spiders trying to crawl all those links at once - could this be causing this overhead? I have also put rel="nofollow" in those links, but this doesn't keep the bots away. Is there any way through code or Plesk to disable those links to those bots?

Read the article
Google webmaster Verification failed.

- by KMC

I have a site created by Ruby on Rails. I had verified against Google Webmaster Tool some months ago, which was successful. One day webmaster starts giving me Re-verification fails. I tried again to verify my site using Meta tags and HTML files. But I kept having "Verification failed. The connection to your server timed out." Since then, Google stop crawling my site's content - though, somehow google still crawl my PDF contents on my site.

Read the article
My domain PageRank shows as unavailable, why is that?

- by Emerson

My domain, http://www.anovaordemmundial.com , has been snatched by some opportunist when I failed to renew the domain. I know, it's all my fault :/ . After I have being ripped off and bought my domain back, and everything is configured and working, the pagerank for that domain shows as unavailable. Also searches for "nova ordem mundial" (in portuguese), which used to show my domain as the first result in searches in any language, now don't show it anymore. Do you think this is something temporary and it will recover its pagerank after a full crawl by google? There exists hundreds of sites pointing to my domain, that is why I got the previous relevance in searches. The domain is back for more than 5 days already. In reality, bing already Is there anything I can do to help get my domain back to its pagerank??? Thanks for the help!

Read the article
Directing crawlers to content in language per language sub-domain

- by Noam

I have a site with multilingual website with many pages (40M). The site has UGC, and each translation is actually for the titles. Each sub-domain points to the same content with different titles per language. As far as I understand, each sub-domain should be indexed by search engines, meaning they will actually need to crawl 40M x supported-languages. So I thought it might be best to direct each subdomain crawler, to pages that are fully in that language (titles + UGC). Is there a way to do this? Should search engines understand this on their own?

Read the article
Google Bot trying to access my web app's sitemap

- by geekrutherford

Interesting find today... I was perusing the event log on our web server today for any unexpected ASP.NET exceptions/errors. Found the following: Exception information: Exception type: HttpException Exception message: Path '/builder/builder.sitemap' is forbidden. Request information: Request URL: https://www.bondwave.com:443/builder/builder.sitemap Request path: /builder/builder.sitemap User host address: 66.249.71.247 User: Is authenticated: False Authentication Type: Thread account name: NT AUTHORITY\NETWORK SERVICE At first I thought this was maybe an attempt by a hacker to mess with the sitemap. Using a handy web site (www.network-tools.com) I did a lookup on the IP address and found it was a Google bot trying to crawl the application. In this case, I would expect an exception or 403 since the site requires authentication anyway.

Read the article
My First robots.txt

- by Whitechapel

I'm creating my first robots.txt and wanted to get a second opinion on it. Basically I have a FTP setup on my board for some special users to transfer files between each other and I do NOT want that included in the search by the bots. I also want to point to my sitemap which gets auto generated by a PHP page. So here is what I have, what else should I include, and if I need to fix anything with it? Also, it's linking to xmlsitemap.php because that generates the sitemap when called. My goal is to allow any search bot crawl the forums to grab meta data. User-agent: * Disallow: /admin/ Disallow: /ali/ Disallow: /benny/ Disallow: /cgi-bin/ Disallow: /ders/ Disallow: /empire/ Disallow: /komodo_117/ Disallow: /xanxan/ Disallow: /zeroordie/ Disallow: /tmp/ Sitemap: http://www.vivalanation.com/forums/xmlsitemap.php Edit, I'm not sure how to handle all the user's folders under /public_html/ since the robots.txt will be going in /public_html.

Read the article
Panda 4: Reducing #indexed pages. How much is enough?

- by Noam

I've been hit by panda 4 (40% decrease). I didn't see any change during panda 1-3. From what I've read it and when compared to my site, the change is probably due to the fact that I have over 30M pages indexed on Google, and they've starting seeing that as some sort of bad indication. Although I feel all of the pages have a unique value that Google should crawl, it seems I should make some tough calls and deduce the indexed pages according to some prioritization I will conduct. The question is what should be my target, or what factors should help me figure out a relevant target. How many pages should I try to reduce to? - 25M - 15M - 1M - 2000 Is it enough to add noindex to low priority pages or should I also remove all internal linking to them?

Read the article
Games display across both monitors when opened

- by Mitch

I am not sure what setting to change for this but I open games like Dungeon Crawl or a Steam game and the game wants to take up both screens. Is there a way to have the game open just on one screen xrandr shows this. So they are both on Screen 0: Screen 0: minimum 320 x 200, current 2966 x 900, maximum 8192 x 8192 LVDS1 connected 1366x768+1600+75 (normal left inverted right x axis y axis) 344mm x 193mm VGA1 connected 1600x900+0+0 (normal left inverted right x axis y axis) 443mm x 249mm If you need any more info or can point me to a place you may have already found an answer please let me know.

Read the article
Google doesn't index a subdomain. What can be the problem and what can be done?

- by fudge

Hi! I have a domain, let's call it example.com, which has a subdomain, games.example.com. I maintain a games forum using phpbbseo which is located at games.example.com/forum. The problem is that the forum is not being crawled. I used Google's webmaster tools and tested that the page is seen by google. P.S. There is a link from games.example.com to games.example.com/forum. What can I do? How can I make google crawl my forum?

Read the article
Google Authorship: can I display:none for link to profile?

- by RubenGeert

I'd like to have my 'mugshot' in Google's SERPs but I couldn't care less about Google+. I don't really want to link my website to Google+ either. Can I use CSS display:none; on the link leading to my profile and still have authorship, which looks like <a href='https://plus.google.com/111823012258578917399?rel=author' rel='nofollow'>Google</a>? Will the nofollow attribute here spoil things? I don't want to lose 'link juice' on Google+ if I don't have to. Now Google should crawl only the HTML but I'm sure they'll figure out the link is not visible (perhaps it's technically even cloaking. Does anybody have experience with this situation? And do I really have to become (reasonably) active on Google+ in order for authorship to show? This answer suggests I do but I didn't read anything on that in Google's guidelines.

Read the article
Does SEO optimisation count on the responsive side of a site?

- by Rick Donohoe

I'm looking at making some SEO optimisation fixes, and at this point I'm sorting out the heading structure and keywords - H1's, H2's etc We have a site where there are a number of similar blocks, and one is always visible, and one is hidden depending on the screen size. This is our method of making a single site responsive. Firstly, how does this technique affect the SEO, and in general does the responsive side of a site matter at all to search engines? What I mean by this is if the site has different content depending on screen sizes, then which content would the search spider crawl?

Read the article
robots.txt, how effective is it and how long does it take?

- by Stefan

We recently updated the site to a single page site using jQuery to slide between "pages". So we now have only index.php. When you search the company on engines such as Google, you get the site and a listing of its sub pages which now lead to outdated pages. Our plan doesn't allow us to edit the .htaccess and the old pages are .html docs so I cannot use PHP redirects either. So if I put in place a robots.txt telling the engines to not crawl beyond index.php, how effective will this be in preventing/removing crawled sub pages. And rough guess, how long before the search engines would update?

Read the article
I've changed my URL schema. How do I tell Google to index the new schema and forget the old one?

- by growse

I had a site where the urls were constructed like this /index.php/Topic /index.php/AnotherTopic These were indexed in google, and search results returned that pointed to these. However, I've recently replatformed that site, and reconfigured it so the above urls would be: /index.php?title=Topic /index.php?title=AnotherTopic The original urls are returning 404s. The site is linking to the correct URL schema internally, but Google is retaining the original schema in its search results. I've updated and resubmitted the sitemap which only contains the new schema. Also, Google's webmasters tool is going slightly bananas at the fact there's now a spike in 404 errors in its crawl results. What would be the best approach to get Google to 'forget' about the old schema, and instead index the new schema? Should I try blocking /index.php/ in robots.txt? Should I be returning 301 codes instead of 404 for the original urls?

Read the article
Webmaster tools showing 404 for non existent folder pages

- by Jody

Google webmaster tools is reporting some/many 404 urls that don't exist on my site. The links are things such as domain.com/xyz/ However that doesn't exist, but domain.com/xyz/index.html does exist. The "linked from" pages all show proper links to the "/xyz/index.html". The page without index.html DOES 404, but why is google even trying these urls if they are not linked to? My real question, is there a way to have google stop attempting to load these pages, and ultimately remove these from the crawl errors report. Thanks.

Read the article
Crawling for geotagged data

- by abe3

I have no experience with web crawlers -- but I know that Apache maintains an open source web crawler called "Lucene." How would I go about writing such a crawler to search the web for geo tagged data close to a particular location? What would a general road map look like? How do I pick which slice of the web to crawl? Do I use regular expressions to find things that look like longitudes and latitudes? What does a general sketch of that solution look like?

Read the article
Did Google Delete my index?

- by Sei

I have a website that I haven't had much time to take care of. I updated it 4 times since last October, the contents are all original and informative. I can see those cache from Way back Machine and the site was indexed in Yahoo, but not in google. Did I get my index on Google deleted because I did not update it often? Normally Google crawl often and index fast, I just got really worried if I did something wrong. Is it possible that I set something wrong in the hosting or something? Please help me

Read the article
SEO: Joomla Category Page Optimization + Canonical Linking

- by Huberis

I'm wondering how best to optimize my Joomla site's SEO. I have pages with multiple articles on each page. Either via category-type pages, or via modules. In each case, I'm not wanting users to access the articles separately from the forward facing, menu-linked pages. I understand however that Joomla still generates a url for those articles, and Google can still crawl and display these articles separate from the pages. My question is what is the best way to control this so that my users get directed only to the front-facing pages? By using the canonical element for each article to point to the front-facing page it's on? Or is there a better method? Thanks for your help!

Read the article
Data architecture for event log metrics?

- by elliot42

My service has a large ongoing number of user events, and we would like to do things like "count occurrence of event type T since date D." We are trying to make two basic decisions: What to store? Storing every event vs. only storing aggregates (Event log style) log every event and count them later, vs. (Time-series style) store a single aggregated "count of event E for date D" for every day Where to store the data In a relational database (particularly MySQL) In a non-relational (NoSQL) database In flat log files (collected centrally over the network via syslog-ng) What is standard practice / where can I read more about comparing the different types of systems? Additional details: The total event stream is large, potentially hundreds of thousands of entries per day But our current need is only to count certain types of events within it We don't necessarily need real-time access to the raw data or aggregation results IMHO, "log all events to files, crawl them at a later time to filter and aggregate the stream" is a pretty standard UNIX Way, but my Rails-y compatriots seem to think that nothing is real unless it's in MySQL.

Read the article
Google is still crawling and indexing my old, dummy, test pages which now are 404 not found

- by Ace

I have set up my site with sample pages and data (lorem ipsum, etc..) and Google has crawled these pages. I deleted all these pages and actually added real content but in webmaster tools, i still get a lot of 404 errors Google trying to crawl these pages. I have set them to "mark as resolved" but some pages still come back as 404. Furthermore, I have a lot of these sample pages still listed when i do a search of my site on Google. How to remove them. I think these irrelevant pages are hurting my rating. I actually wanted to erase all these pages and start getting my site being being indexed as a new one but I read it's not possible? (I have submitted a sitemap and used "Fetch as Google.")

Read the article

< Previous Page | 97 98 99 100 101 102 103 104 105 106 107 108 | Next Page >