Search Results

Search found 446 results on 18 pages for 'crawl'.

Page 9/18 | < Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16 | Next Page >

My domain PageRank shows as unavailable, why is that?

- by Emerson

My domain, http://www.anovaordemmundial.com , has been snatched by some opportunist when I failed to renew the domain. I know, it's all my fault :/ . After I have being ripped off and bought my domain back, and everything is configured and working, the pagerank for that domain shows as unavailable. Also searches for "nova ordem mundial" (in portuguese), which used to show my domain as the first result in searches in any language, now don't show it anymore. Do you think this is something temporary and it will recover its pagerank after a full crawl by google? There exists hundreds of sites pointing to my domain, that is why I got the previous relevance in searches. The domain is back for more than 5 days already. In reality, bing already Is there anything I can do to help get my domain back to its pagerank??? Thanks for the help!

Read the article
Google Bot trying to access my web app's sitemap

- by geekrutherford

Interesting find today... I was perusing the event log on our web server today for any unexpected ASP.NET exceptions/errors. Found the following: Exception information: Exception type: HttpException Exception message: Path '/builder/builder.sitemap' is forbidden. Request information: Request URL: https://www.bondwave.com:443/builder/builder.sitemap Request path: /builder/builder.sitemap User host address: 66.249.71.247 User: Is authenticated: False Authentication Type: Thread account name: NT AUTHORITY\NETWORK SERVICE At first I thought this was maybe an attempt by a hacker to mess with the sitemap. Using a handy web site (www.network-tools.com) I did a lookup on the IP address and found it was a Google bot trying to crawl the application. In this case, I would expect an exception or 403 since the site requires authentication anyway.

Read the article
Server overhead caused by bots?

- by giuseppe

I have one customer website causing overhead (http://www.modacalcio.it/en/by-kind/football-boots.html). With htop opened, I am trying navigate the website and the much load of the website is done by the ajax link being placed on the left side of the website. The website is hosted by a VPS with 3 proc and 2GB RAM, with enough hard with disk space. The real problem is that this website is new and not visited much. From the http-status module I am seeing that the overhead is caused by bots (Google bots, Bing bots, hrefs checker and so on). So I thought that's probably due to those spiders trying to crawl all those links at once - could this be causing this overhead? I have also put rel="nofollow" in those links, but this doesn't keep the bots away. Is there any way through code or Plesk to disable those links to those bots?

Read the article
Google webmaster Verification failed.

- by KMC

I have a site created by Ruby on Rails. I had verified against Google Webmaster Tool some months ago, which was successful. One day webmaster starts giving me Re-verification fails. I tried again to verify my site using Meta tags and HTML files. But I kept having "Verification failed. The connection to your server timed out." Since then, Google stop crawling my site's content - though, somehow google still crawl my PDF contents on my site.

Read the article
Google Authorship: can I display:none for link to profile?

- by RubenGeert

I'd like to have my 'mugshot' in Google's SERPs but I couldn't care less about Google+. I don't really want to link my website to Google+ either. Can I use CSS display:none; on the link leading to my profile and still have authorship, which looks like <a href='https://plus.google.com/111823012258578917399?rel=author' rel='nofollow'>Google</a>? Will the nofollow attribute here spoil things? I don't want to lose 'link juice' on Google+ if I don't have to. Now Google should crawl only the HTML but I'm sure they'll figure out the link is not visible (perhaps it's technically even cloaking. Does anybody have experience with this situation? And do I really have to become (reasonably) active on Google+ in order for authorship to show? This answer suggests I do but I didn't read anything on that in Google's guidelines.

Read the article
My First robots.txt

- by Whitechapel

I'm creating my first robots.txt and wanted to get a second opinion on it. Basically I have a FTP setup on my board for some special users to transfer files between each other and I do NOT want that included in the search by the bots. I also want to point to my sitemap which gets auto generated by a PHP page. So here is what I have, what else should I include, and if I need to fix anything with it? Also, it's linking to xmlsitemap.php because that generates the sitemap when called. My goal is to allow any search bot crawl the forums to grab meta data. User-agent: * Disallow: /admin/ Disallow: /ali/ Disallow: /benny/ Disallow: /cgi-bin/ Disallow: /ders/ Disallow: /empire/ Disallow: /komodo_117/ Disallow: /xanxan/ Disallow: /zeroordie/ Disallow: /tmp/ Sitemap: http://www.vivalanation.com/forums/xmlsitemap.php Edit, I'm not sure how to handle all the user's folders under /public_html/ since the robots.txt will be going in /public_html.

Read the article
Games display across both monitors when opened

- by Mitch

I am not sure what setting to change for this but I open games like Dungeon Crawl or a Steam game and the game wants to take up both screens. Is there a way to have the game open just on one screen xrandr shows this. So they are both on Screen 0: Screen 0: minimum 320 x 200, current 2966 x 900, maximum 8192 x 8192 LVDS1 connected 1366x768+1600+75 (normal left inverted right x axis y axis) 344mm x 193mm VGA1 connected 1600x900+0+0 (normal left inverted right x axis y axis) 443mm x 249mm If you need any more info or can point me to a place you may have already found an answer please let me know.

Read the article
Panda 4: Reducing #indexed pages. How much is enough?

- by Noam

I've been hit by panda 4 (40% decrease). I didn't see any change during panda 1-3. From what I've read it and when compared to my site, the change is probably due to the fact that I have over 30M pages indexed on Google, and they've starting seeing that as some sort of bad indication. Although I feel all of the pages have a unique value that Google should crawl, it seems I should make some tough calls and deduce the indexed pages according to some prioritization I will conduct. The question is what should be my target, or what factors should help me figure out a relevant target. How many pages should I try to reduce to? - 25M - 15M - 1M - 2000 Is it enough to add noindex to low priority pages or should I also remove all internal linking to them?

Read the article
Directing crawlers to content in language per language sub-domain

- by Noam

I have a site with multilingual website with many pages (40M). The site has UGC, and each translation is actually for the titles. Each sub-domain points to the same content with different titles per language. As far as I understand, each sub-domain should be indexed by search engines, meaning they will actually need to crawl 40M x supported-languages. So I thought it might be best to direct each subdomain crawler, to pages that are fully in that language (titles + UGC). Is there a way to do this? Should search engines understand this on their own?

Read the article
Google doesn't index a subdomain. What can be the problem and what can be done?

- by fudge

Hi! I have a domain, let's call it example.com, which has a subdomain, games.example.com. I maintain a games forum using phpbbseo which is located at games.example.com/forum. The problem is that the forum is not being crawled. I used Google's webmaster tools and tested that the page is seen by google. P.S. There is a link from games.example.com to games.example.com/forum. What can I do? How can I make google crawl my forum?

Read the article
Webmaster tools showing 404 for non existent folder pages

- by Jody

Google webmaster tools is reporting some/many 404 urls that don't exist on my site. The links are things such as domain.com/xyz/ However that doesn't exist, but domain.com/xyz/index.html does exist. The "linked from" pages all show proper links to the "/xyz/index.html". The page without index.html DOES 404, but why is google even trying these urls if they are not linked to? My real question, is there a way to have google stop attempting to load these pages, and ultimately remove these from the crawl errors report. Thanks.

Read the article
robots.txt, how effective is it and how long does it take?

- by Stefan

We recently updated the site to a single page site using jQuery to slide between "pages". So we now have only index.php. When you search the company on engines such as Google, you get the site and a listing of its sub pages which now lead to outdated pages. Our plan doesn't allow us to edit the .htaccess and the old pages are .html docs so I cannot use PHP redirects either. So if I put in place a robots.txt telling the engines to not crawl beyond index.php, how effective will this be in preventing/removing crawled sub pages. And rough guess, how long before the search engines would update?

Read the article
Does SEO optimisation count on the responsive side of a site?

- by Rick Donohoe

I'm looking at making some SEO optimisation fixes, and at this point I'm sorting out the heading structure and keywords - H1's, H2's etc We have a site where there are a number of similar blocks, and one is always visible, and one is hidden depending on the screen size. This is our method of making a single site responsive. Firstly, how does this technique affect the SEO, and in general does the responsive side of a site matter at all to search engines? What I mean by this is if the site has different content depending on screen sizes, then which content would the search spider crawl?

Read the article
Crawling for geotagged data

- by abe3

I have no experience with web crawlers -- but I know that Apache maintains an open source web crawler called "Lucene." How would I go about writing such a crawler to search the web for geo tagged data close to a particular location? What would a general road map look like? How do I pick which slice of the web to crawl? Do I use regular expressions to find things that look like longitudes and latitudes? What does a general sketch of that solution look like?

Read the article
I've changed my URL schema. How do I tell Google to index the new schema and forget the old one?

- by growse

I had a site where the urls were constructed like this /index.php/Topic /index.php/AnotherTopic These were indexed in google, and search results returned that pointed to these. However, I've recently replatformed that site, and reconfigured it so the above urls would be: /index.php?title=Topic /index.php?title=AnotherTopic The original urls are returning 404s. The site is linking to the correct URL schema internally, but Google is retaining the original schema in its search results. I've updated and resubmitted the sitemap which only contains the new schema. Also, Google's webmasters tool is going slightly bananas at the fact there's now a spike in 404 errors in its crawl results. What would be the best approach to get Google to 'forget' about the old schema, and instead index the new schema? Should I try blocking /index.php/ in robots.txt? Should I be returning 301 codes instead of 404 for the original urls?

Read the article
Did Google Delete my index?

- by Sei

I have a website that I haven't had much time to take care of. I updated it 4 times since last October, the contents are all original and informative. I can see those cache from Way back Machine and the site was indexed in Yahoo, but not in google. Did I get my index on Google deleted because I did not update it often? Normally Google crawl often and index fast, I just got really worried if I did something wrong. Is it possible that I set something wrong in the hosting or something? Please help me

Read the article
SEO: Joomla Category Page Optimization + Canonical Linking

- by Huberis

I'm wondering how best to optimize my Joomla site's SEO. I have pages with multiple articles on each page. Either via category-type pages, or via modules. In each case, I'm not wanting users to access the articles separately from the forward facing, menu-linked pages. I understand however that Joomla still generates a url for those articles, and Google can still crawl and display these articles separate from the pages. My question is what is the best way to control this so that my users get directed only to the front-facing pages? By using the canonical element for each article to point to the front-facing page it's on? Or is there a better method? Thanks for your help!

Read the article
problem showing my website correctly in search engines

- by dinbrca

Hello guys, I have a website which i have indexed on google for example (like 15 days ago). some of my pages pass arguments like: http://www.bla.com/products.php?pro=bla&page=view suddently i saw that passing arguments like this isn't good for SEO purposes and started using htaccess rewrite. and changed the arguments to like this: http://www.bla.com/products/bla/*view*/ now my site on google still shows as i showed at link number 1 what should i do? i thought i should wait for the search engine to crawl my site again but nothing happened. thanks in advanced, Din

Read the article
Data architecture for event log metrics?

- by elliot42

My service has a large ongoing number of user events, and we would like to do things like "count occurrence of event type T since date D." We are trying to make two basic decisions: What to store? Storing every event vs. only storing aggregates (Event log style) log every event and count them later, vs. (Time-series style) store a single aggregated "count of event E for date D" for every day Where to store the data In a relational database (particularly MySQL) In a non-relational (NoSQL) database In flat log files (collected centrally over the network via syslog-ng) What is standard practice / where can I read more about comparing the different types of systems? Additional details: The total event stream is large, potentially hundreds of thousands of entries per day But our current need is only to count certain types of events within it We don't necessarily need real-time access to the raw data or aggregation results IMHO, "log all events to files, crawl them at a later time to filter and aggregate the stream" is a pretty standard UNIX Way, but my Rails-y compatriots seem to think that nothing is real unless it's in MySQL.

Read the article
My website google index suddenly increase and also suddenly reduced

- by Jeg Bagus

Yesterday before i sleep, i check my site index. i get about 50 index on google. today morning when i wake up, i get 250 index on google. and my page ranking better on several keyword. than i add 1 page and 2 canonical link, add 404 page header, and resubmit sitemap. and after 2 hour, its going down to 50 index again. and my page ranking just rolled back to previous day. what is actually happen? is it because i resubmit sitemap? until now, google still crawl my website. do they try to refresh the index?

Read the article
Google is still crawling and indexing my old, dummy, test pages which now are 404 not found

- by Ace

I have set up my site with sample pages and data (lorem ipsum, etc..) and Google has crawled these pages. I deleted all these pages and actually added real content but in webmaster tools, i still get a lot of 404 errors Google trying to crawl these pages. I have set them to "mark as resolved" but some pages still come back as 404. Furthermore, I have a lot of these sample pages still listed when i do a search of my site on Google. How to remove them. I think these irrelevant pages are hurting my rating. I actually wanted to erase all these pages and start getting my site being being indexed as a new one but I read it's not possible? (I have submitted a sitemap and used "Fetch as Google.")

Read the article
Google doesn't index a subdomain. What can be the problem and what can be done?

- by fudge

I have a domain, let's call it example.com, which has a subdomain, games.example.com. I maintain a games forum using phpbbseo which is located at games.example.com/forum. The problem is that the forum is not being crawled. I used Google's webmaster tools and tested that the page is seen by google. P.S. There is a link from games.example.com to games.example.com/forum. What can I do? How can I make google crawl my forum?

Read the article
Space in img:s "ALT" attribute good/bad for search engines?

- by Camran

I am trying to make it easier for search engines to crawl my website, as it is almost 100% dynamic. I have a couple of transparent images which are actually links to sections of my page. I wonder, if I add an "alt" attribute containing space characters to explain the target, will this improve SE rankings etc? For example: <img src="blabla.png" alt="post new classified"> Or will this just result in errors? Ànd, what should I put in the alt attribute if I can't use space? PS: Another different and short question, will javascript-rich content make a page less important to crawlers? Thanks

Read the article
Should I include everything in the sitemap or only new content?

- by Mee

For a website with dynamic content (new content is constantly being added), should I only include the newest content in the sitemap or should I include everything (with a sitemap index)? What are the best practices for sitemaps esp. for large sites? Also, is there anyway to make google (and other search engines) only crawl the pages in the sitemap? Thanks Update: Also, any idea how stackoverflow handle this? I'd like to know but unfortunately (also understandingly) they have blocked access to their sitemap.

Read the article
Issue with sitemap in GWT

- by Anusha

I have an e-commerce website www.beyondtime.in, i have been constantly monitoring the google bot crawling on my website and my webmaster account. Lately, i have found two issues that i have not been able to understand and hence want your help. 1.) The Google Bots have been only crawling www.beyondtime.in/telecom.php this URL of my website, when the URL is not even valid. So, kindly help me understand what needs to be done to let Google crawl other pages of the website as well. 2.) The second question is about the Google Webmaster account, where i've submitted my sitmap with 227 URLs, but out of that only 156 have been indexed. Also none of the images of my website have been indexed by Google. So kindly help me with this as well. Thanks

Read the article

< Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16 | Next Page >