Search Results

Search found 160 results on 7 pages for 'googlebot'.

Page 5/7 | < Previous Page | 1 2 3 4 5 6 7 | Next Page >

Access denied 403 errors after migrating my site

- by AgA

I've recently migrated my Joomla site from one shared hosting to another with Hostgator. GWT notified me about many 403 access denied pages. I've checked with Firebug too, and even though browser is displaying full page correctly but http return is 403. I've checked the home page but it's correctly returing 200 response. The same is shown by Fetch as Google in GWT(pasted this in the bottom). The site is 3 years old and I regularly do such migrations. I've copied the files and database "AS IS". I've even cleared all the caches but no luck. There is only one change: previously the site was primary domain but now it's add-on one. What could be the issue? This is how Googlebot fetched the page. Fetch as Google URL: http://MYSITE.COM/-----------------REMOVED.html Date: Thursday, June 20, 2013 at 10:32:14 PM PDT Googlebot Type: Web Download Time (in milliseconds): 3899 HTTP/1.1 403 Forbidden Date: Fri, 21 Jun 2013 05:32:15 GMT Server: Apache P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM" Expires: Mon, 1 Jan 2001 00:00:00 GMT Cache-Control: post-check=0, pre-check=0 Pragma: no-cache Set-Cookie: 0e4f6b53991c80cf39d57a6db58bb58d=ee2d880e8db0f1fc03c5612ea5a77004; path=/ Last-Modified: Fri, 21 Jun 2013 05:32:19 GMT Keep-Alive: timeout=5, max=75 Connection: Keep-Alive Transfer-Encoding: chunked Content-Type: text/html; charset=utf-8 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-gb" lang="en-gb" > <head> <base href="http://www.mysite.com/-----------------rajiv-yuva-shakthi-programme-finance-planning.html" /> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <meta name="robots" content="index, follow" /> <meta name="keywords" content="" /> <<<<<<TRIMMED>>>>>>>>>>>>>>

Read the article
weird POST request in IIS logs

- by MIrrorMirror

I noticed weird log entries (unless there's something i don't understand) in my IIS (7.5) logs. it's an online dictionary with requests ( user friendly url rewriting ) and most of them are GET. However I noticed weird POST requests which are taking place by a person who is trying to crawl our content ( tens of thousands of such requests ) 2013-11-09 20:39:27 GET /dict/mylang/word1 - y.y.y.y Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) - 200 296 2013-11-09 20:39:29 GET /dict/mylang/word2 - z.z.z.z Mozilla/5.0+(iPhone;+CPU+iPhone+OS+6_0+like+Mac+OS+X)+AppleWebKit/536.26+(KHTML,+like+Gecko)+Version/6.0+Mobile/10A5376e+Safari/8536.25+(compatible;+Googlebot-Mobile/2.1;++http://www.google.com/bot.html) - 200 468 2013-11-09 20:39:29 POST /dict/mylang/word3 - x.x.x.x - - 200 2593 The two first requests are legal. Now for the third request, I don't think I have allowed cross domain POST. if that what the third log line means. all those POST requests take that much time for unknown reasons to me. I would like to know how are those POST requests possible and how can I stop them. p.s. I have masked the IPs on purpose. any help would be appreciated! thank you in advance.

Read the article
Apache process consuming all memory on the server

- by jemmille

I have an apache process that suddenly appears on a particular server. When it shows up it starts consuming memory at a very rapid rate, then moves on to all the swap. In all it consumes about 11GB (including swap) of memory and the server eventually becomes unresponsive. The load on the server is under 1 at all other times. The process runs as nobody and I am having a hard time tracking down the source. If i run an strace on the process and all it did was continuously dump out mprotect over and over again If i run lsof -p <pid>, I get this, but only sometimes: httpd 19229 nobody 152u IPv4 175050 crawl-66-249-67-216.googlebot.com:62336 (CLOSE_WAIT) httpd 19229 nobody 153u IPv4 179104 crawl-66-249-71-167.googlebot.com:58012 (ESTABLISHED) As long as I catch it, I can kill the process and the server almost immediately stabilizes. I have on site on the server that is getting a few thousand hits a a day that I think might be the source, but I still can't find the exact reason. Also, this is a cPanel server and I have upcp'd the server, rebuilt apache with easy apache, and rebuilt httpd.conf. It is not spawing any related processes, meaning I can find any php, mysql, cgi, etc. processes that relate to this process. It's just a loner process that balloons fast and consumes ever last MB of memory. This is on a XenServer 5.6 based VM. No other servers in the cluster are having this issue.

Read the article
Is there any reason this cronjob would fail in cron, but not on the command line?

- by Treffynnon

I have written a little one liner that will email me when a list of files changes - I used sha512 to generate a list of hashes and then periodically check that those hashes still match. */5 * * * * /usr/bin/sha512sum --status -c /sha512.sumlist && echo "Success" > /dev/null || echo "Check robots.txt and index.html in /var/www as staging sites are now potentially exposed to the world and the damned googlebot" | /usr/bin/mail -s "Default staging server files have changed" [email protected] It works fine on the command line with: /usr/bin/sha512sum --status -c /sha512.sumlist && echo "Success" > /dev/null || echo "Check robots.txt and index.html in /var/www as staging sites are now potentially exposed to the world and the damned googlebot" | /usr/bin/mail -s "Default staging server files have changed" [email protected] As soon as I run it as a cronjob though it emails every time it runs with the failure message instead of only when the sha512sum check should fail. Is there something silly I have missed in a rush? I forgot to mention that I am running an Ubuntu machine.

Read the article
PHP & cUrl - POST problems in while loop.

- by Max Hoff

I have a while loop, with cUrl inside. (I can't use curl_multi for various reasons.) The problem is that cUrl seems to save the data it POSTS after each loop traversal. For instance, if parameter X is One the first loop through, and if it's Two the second loop through, cUrl posts: "One,Two". It should just POST "Two".(This is despite closing and unsetting the curl handle.) Here's a simplified version of the code (with unecessary info stripped out): while(true){ // code to form the $url. this code is local to the loop. so the variables should be "erased" and made new for each loop through. $ch = curl_init(); $userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)'; curl_setopt($ch,CURLOPT_USERAGENT, $userAgent); curl_setopt($ch,CURLOPT_URL,$url); curl_setopt($ch,CURLOPT_POST,count($fields)); curl_setopt($ch,CURLOPT_POSTFIELDS,$fields_string); curl_setopt($ch, CURLOPT_FAILONERROR, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); curl_setopt($ch, CURLOPT_AUTOREFERER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_TIMEOUT, 10); $html = curl_exec($ch); curl_close($ch); unset($ch); $dom = new DOMDocument(); @$dom->loadHTML($html); $xpath = new DOMXPath($dom); $resultTable = $xpath->evaluate("/html/body//table"); // $resultTable is 20 the first time through the loop, and 0 everytime thereafter becauset he POSTing doesn't work right with the "saved" parameters. What am I doing wrong here?

Read the article
Rails Browser Detection Methods

- by alvincrespo

Hey Everyone, I was wondering what methods are standard within the industry to do browser detection in Rails? Is there a gem, library or sample code somewhere that can help determine the browser and apply a class or id to the body element of the (X)HTML? Thanks, I'm just wondering what everyone uses and whether there is accepted method of doing this? I know that we can get the user.agent and parse that string, but I'm not sure if that is that is an acceptable way to do browser detection. Also, I'm not trying to debate feature detection here, I've read multiple answers for that on StackOverflow, all I'm asking for is what you guys have done. [UPDATE] So thanks to faunzy on GitHub, I've sort of understand a bit about checking the user agent in Rails, but still not sure if this is the best way to go about it in Rails 3. But here is what I've gotten so far: def users_browser user_agent = request.env['HTTP_USER_AGENT'].downcase @users_browser ||= begin if user_agent.index('msie') && !user_agent.index('opera') && !user_agent.index('webtv') 'ie'+user_agent[user_agent.index('msie')+5].chr elsif user_agent.index('gecko/') 'gecko' elsif user_agent.index('opera') 'opera' elsif user_agent.index('konqueror') 'konqueror' elsif user_agent.index('ipod') 'ipod' elsif user_agent.index('ipad') 'ipad' elsif user_agent.index('iphone') 'iphone' elsif user_agent.index('chrome/') 'chrome' elsif user_agent.index('applewebkit/') 'safari' elsif user_agent.index('googlebot/') 'googlebot' elsif user_agent.index('msnbot') 'msnbot' elsif user_agent.index('yahoo! slurp') 'yahoobot' #Everything thinks it's mozilla, so this goes last elsif user_agent.index('mozilla/') 'gecko' else 'unknown' end end return @users_browser end

Read the article
How can I set up a load balancer to direct all Search Engine Bot traffic to one server?

- by Ryan

We have a simple load balancer set up on Rackspace to 3 web server nodes. After reviewing our traffic and expenses, the largest bandwidth hog is Google Bot. Since on Rackspace we pay for bandwidth by the byte, we'd like to direct all traffic from GoogleBot to another host (MediaTemple) with unlimited bandwidth. We think this would cut our hosting bill several thousand dollars a month. Is this possible? Advisable?

Read the article
Espeak SAPI/dll usage on Windows ?

- by Quandary

Question: I am trying to use the espeak text-to-speech engine. So for I got it working wounderfully on linux (code below). Now I wanted to port this basic program to windows, too, but it's nearly impossible... Part of the problem is that the windows dll only allows for AUDIO_OUTPUT_SYNCHRONOUS, which means it requires a callback, but I can't figure out how to play the audio from the callback... First it crashed, then I realized, I need a callback function, now I get the data in the callback function, but I don't know how to play it... as it is neither a wav file nor plays automatically as on Linux. The sourceforge site is rather useless, because it basically says use the SAPI version, but then there is no example on how to use the sapi espeak dll... Anyway, here's my code, can anybody help? #ifdef __cplusplus #include <cstdio> #include <cstdlib> #include <cstring> else #include <stdio.h> #include <stdlib.h> #include <string.h> endif include include //#include "speak_lib.h" include "espeak/speak_lib.h" // libespeak-dev: /usr/include/espeak/speak_lib.h // apt-get install libespeak-dev // apt-get install libportaudio-dev // g++ -o mine mine.cpp -lespeak // g++ -o mine mine.cpp -I/usr/include/espeak/ -lespeak // gcc -o mine mine.cpp -I/usr/include/espeak/ -lespeak char voicename[40]; int samplerate; int quiet = 0; static char genders[4] = {' ','M','F',' '}; //const char *data_path = "/usr/share/"; // /usr/share/espeak-data/ const char *data_path = NULL; // use default path for espeak-data int strrcmp(const char *s, const char *sub) { int slen = strlen(s); int sublen = strlen(sub); return memcmp(s + slen - sublen, sub, sublen); } char * strrcpy(char *dest, const char *source) { // Pre assertions assert(dest != NULL); assert(source != NULL); assert(dest != source); // tk: parentheses while((*dest++ = *source++)) ; return(--dest); } const char* GetLanguageVoiceName(const char* pszShortSign) { #define LANGUAGE_LENGTH 30 static char szReturnValue[LANGUAGE_LENGTH] ; memset(szReturnValue, 0, LANGUAGE_LENGTH); for (int i = 0; pszShortSign[i] != '\0'; ++i) szReturnValue[i] = (char) tolower(pszShortSign[i]); const espeak_VOICE **voices; espeak_VOICE voice_select; voices = espeak_ListVoices(NULL); const espeak_VOICE *v; for(int ix=0; (v = voices[ix]) != NULL; ix++) { if( !strrcmp( v->languages, szReturnValue) ) { strcpy(szReturnValue, v->name); return szReturnValue; } } // End for strcpy(szReturnValue, "default"); return szReturnValue; } // End function getvoicename void ListVoices() { const espeak_VOICE **voices; espeak_VOICE voice_select; voices = espeak_ListVoices(NULL); const espeak_VOICE *v; for(int ix=0; (v = voices[ix]) != NULL; ix++) { printf("Shortsign: %s\n", v->languages); printf("age: %d\n", v->age); printf("gender: %c\n", genders[v->gender]); printf("name: %s\n", v->name); printf("\n\n"); } // End for } // End function getvoicename int main() { printf("Hello World!\n"); const char* szVersionInfo = espeak_Info(NULL); printf("Espeak version: %s\n", szVersionInfo); samplerate = espeak_Initialize(AUDIO_OUTPUT_PLAYBACK,0,data_path,0); strcpy(voicename, "default"); // espeak --voices strcpy(voicename, "german"); strcpy(voicename, GetLanguageVoiceName("DE")); if(espeak_SetVoiceByName(voicename) != EE_OK) { printf("Espeak setvoice error...\n"); } static char word[200] = "Hello World" ; strcpy(word, "TV-fäns aufgepasst, es ist 20 Uhr 15. Zeit für Rambo 3"); strcpy(word, "Unnamed Player wurde zum Opfer von GSG9"); int speed = 220; int volume = 500; // volume in range 0-100 0=silence int pitch = 50; // base pitch, range 0-100. 50=normal // espeak.cpp 625 espeak_SetParameter(espeakRATE, speed, 0); espeak_SetParameter(espeakVOLUME,volume,0); espeak_SetParameter(espeakPITCH,pitch,0); // espeakRANGE: pitch range, range 0-100. 0-monotone, 50=normal // espeakPUNCTUATION: which punctuation characters to announce: // value in espeak_PUNCT_TYPE (none, all, some), espeak_VOICE *voice_spec = espeak_GetCurrentVoice(); voice_spec->gender=2; // 0=none 1=male, 2=female, //voice_spec->age = age; espeak_SetVoiceByProperties(voice_spec); espeak_Synth( (char*) word, strlen(word)+1, 0, POS_CHARACTER, 0, espeakCHARS_AUTO, NULL, NULL); espeak_Synchronize(); strcpy(voicename, GetLanguageVoiceName("EN")); espeak_SetVoiceByName(voicename); strcpy(word, "Geany was fragged by GSG9 Googlebot"); strcpy(word, "Googlebot"); espeak_Synth( (char*) word, strlen(word)+1, 0, POS_CHARACTER, 0, espeakCHARS_AUTO, NULL, NULL); espeak_Synchronize(); espeak_Terminate(); printf("Espeak terminated\n"); return EXIT_SUCCESS; } /* if(espeak_SetVoiceByName(voicename) != EE_OK) { memset(&voice_select,0,sizeof(voice_select)); voice_select.languages = voicename; if(espeak_SetVoiceByProperties(&voice_select) != EE_OK) { fprintf(stderr,"%svoice '%s'\n",err_load,voicename); exit(2); } } */ The above code is for Linux. The below code is about as far as I got on Vista x64 (32 bit emu): #ifdef __cplusplus #include <cstdio> #include <cstdlib> #include <cstring> else #include <stdio.h> #include <stdlib.h> #include <string.h> endif include include include "speak_lib.h" //#include "espeak/speak_lib.h" // libespeak-dev: /usr/include/espeak/speak_lib.h // apt-get install libespeak-dev // apt-get install libportaudio-dev // g++ -o mine mine.cpp -lespeak // g++ -o mine mine.cpp -I/usr/include/espeak/ -lespeak // gcc -o mine mine.cpp -I/usr/include/espeak/ -lespeak char voicename[40]; int iSampleRate; int quiet = 0; static char genders[4] = {' ','M','F',' '}; //const char *data_path = "/usr/share/"; // /usr/share/espeak-data/ //const char *data_path = NULL; // use default path for espeak-data const char *data_path = "C:\Users\Username\Desktop\espeak-1.43-source\espeak-1.43-source\"; int strrcmp(const char *s, const char *sub) { int slen = strlen(s); int sublen = strlen(sub); return memcmp(s + slen - sublen, sub, sublen); } char * strrcpy(char *dest, const char *source) { // Pre assertions assert(dest != NULL); assert(source != NULL); assert(dest != source); // tk: parentheses while((*dest++ = *source++)) ; return(--dest); } const char* GetLanguageVoiceName(const char* pszShortSign) { #define LANGUAGE_LENGTH 30 static char szReturnValue[LANGUAGE_LENGTH] ; memset(szReturnValue, 0, LANGUAGE_LENGTH); for (int i = 0; pszShortSign[i] != '\0'; ++i) szReturnValue[i] = (char) tolower(pszShortSign[i]); const espeak_VOICE **voices; espeak_VOICE voice_select; voices = espeak_ListVoices(NULL); const espeak_VOICE *v; for(int ix=0; (v = voices[ix]) != NULL; ix++) { if( !strrcmp( v->languages, szReturnValue) ) { strcpy(szReturnValue, v->name); return szReturnValue; } } // End for strcpy(szReturnValue, "default"); return szReturnValue; } // End function getvoicename void ListVoices() { const espeak_VOICE **voices; espeak_VOICE voice_select; voices = espeak_ListVoices(NULL); const espeak_VOICE *v; for(int ix=0; (v = voices[ix]) != NULL; ix++) { printf("Shortsign: %s\n", v->languages); printf("age: %d\n", v->age); printf("gender: %c\n", genders[v->gender]); printf("name: %s\n", v->name); printf("\n\n"); } // End for } // End function getvoicename /* Callback from espeak. Directly speaks using AudioTrack. */ define LOGI(x) printf("%s\n", x) static int AndroidEspeakDirectSpeechCallback(short *wav, int numsamples, espeak_EVENT *events) { char buf[100]; sprintf(buf, "AndroidEspeakDirectSpeechCallback: %d samples", numsamples); LOGI(buf); if (wav == NULL) { LOGI("Null: speech has completed"); } if (numsamples > 0) { //audout->write(wav, sizeof(short) * numsamples); sprintf(buf, "AudioTrack wrote: %d bytes", sizeof(short) * numsamples); LOGI(buf); } return 0; // continue synthesis (1 is to abort) } static int AndroidEspeakSynthToFileCallback(short *wav, int numsamples,espeak_EVENT *events) { char buf[100]; sprintf(buf, "AndroidEspeakSynthToFileCallback: %d samples", numsamples); LOGI(buf); if (wav == NULL) { LOGI("Null: speech has completed"); } // The user data should contain the file pointer of the file to write to //void* user_data = events->user_data; FILE* user_data = fopen ( "myfile1.wav" , "ab" ); FILE* fp = static_cast<FILE *>(user_data); // Write all of the samples fwrite(wav, sizeof(short), numsamples, fp); return 0; // continue synthesis (1 is to abort) } int main() { printf("Hello World!\n"); const char* szVersionInfo = espeak_Info(NULL); printf("Espeak version: %s\n", szVersionInfo); iSampleRate = espeak_Initialize(AUDIO_OUTPUT_SYNCHRONOUS, 4096, data_path, 0); if (iSampleRate <= 0) { printf("Unable to initialize espeak"); return EXIT_FAILURE; } //samplerate = espeak_Initialize(AUDIO_OUTPUT_PLAYBACK,0,data_path,0); //ListVoices(); strcpy(voicename, "default"); // espeak --voices //strcpy(voicename, "german"); //strcpy(voicename, GetLanguageVoiceName("DE")); if(espeak_SetVoiceByName(voicename) != EE_OK) { printf("Espeak setvoice error...\n"); } static char word[200] = "Hello World" ; strcpy(word, "TV-fäns aufgepasst, es ist 20 Uhr 15. Zeit für Rambo 3"); strcpy(word, "Unnamed Player wurde zum Opfer von GSG9"); int speed = 220; int volume = 500; // volume in range 0-100 0=silence int pitch = 50; // base pitch, range 0-100. 50=normal // espeak.cpp 625 espeak_SetParameter(espeakRATE, speed, 0); espeak_SetParameter(espeakVOLUME,volume,0); espeak_SetParameter(espeakPITCH,pitch,0); // espeakRANGE: pitch range, range 0-100. 0-monotone, 50=normal // espeakPUNCTUATION: which punctuation characters to announce: // value in espeak_PUNCT_TYPE (none, all, some), //espeak_VOICE *voice_spec = espeak_GetCurrentVoice(); //voice_spec->gender=2; // 0=none 1=male, 2=female, //voice_spec->age = age; //espeak_SetVoiceByProperties(voice_spec); //espeak_SetSynthCallback(AndroidEspeakDirectSpeechCallback); espeak_SetSynthCallback(AndroidEspeakSynthToFileCallback); unsigned int unique_identifier; espeak_ERROR err = espeak_Synth( (char*) word, strlen(word)+1, 0, POS_CHARACTER, 0, espeakCHARS_AUTO, &unique_identifier, NULL); err = espeak_Synchronize(); /* strcpy(voicename, GetLanguageVoiceName("EN")); espeak_SetVoiceByName(voicename); strcpy(word, "Geany was fragged by GSG9 Googlebot"); strcpy(word, "Googlebot"); espeak_Synth( (char*) word, strlen(word)+1, 0, POS_CHARACTER, 0, espeakCHARS_AUTO, NULL, NULL); espeak_Synchronize(); */ // espeak_Cancel(); espeak_Terminate(); printf("Espeak terminated\n"); system("pause"); return EXIT_SUCCESS; }

Read the article
Google Webmaster Tools, DNS Errors & HostPapa

- by Gravy

Received a message from Google Webmaster Tools: Over the last 24 hours, Googlebot encountered 2 errors while attempting to retrieve DNS information for your site. The overall error rate for DNS queries for your site is 40.0%. You can see more details about these errors in Webmaster Tools. Recommended action Contacted HostPapa and they deny that there is any issue with the site / server!!! Support in terms of what I can do to actually resolve this issue is non-existent!!!! The site is currently online. And I don't know much about DNS... so any advice about what I can do to resolve this problem would be much appreciated. Basically, the message from Google says that it is my webhosts fault, the message from my webhost (HostPapa) is... "Just tell google to crawl your site as there are no errors."

Read the article
Does redirect popup window affect SEO?

- by Joseph

We have multiple websites, each site servers number of countries, and we used to have Geo-Ip Auto redirect system (no one likes auto-redirect), so we implemented another redirect system also uses Geo-IP database, but showing a pop-up window (HTML layer pop-up, so it can't be rejected), this window asks the visitor if he would like to continue with this page or go to the correct website of his country. We also added a test line before showing the pop-up, so if the visitor is Googlebot, the popup will not show up :). I was wondering if this effects our websites SEO?

Read the article
How to solve "Login Only" rejection?

- by Renan

Recently, a site of mine was rejected due to "Login Only": "Login Only: During our review of your website, we found that the majority of pages on your site are behind a login, or there is restricted access. Please note that we will not approve applications for login-protected pages, as we are not able to review their content for acceptance into the program." Although the site does require login to send content, it doesn't require any to view any page. How do I tell the Googlebot or whatever is used to crawl pages to adsense that all the content is publicly available but registration is needed to post?

Read the article
Does iframe affect SEO of its parent page?

- by Xu Jiawan

I would like to know that, does iframe affect the SEO of its parent page (the page contains iframe)? I've done some searching, such as Do we still need to avoid using frame/iframe for good SEO? and Using iFrame: SEO and Accessibility Points, which tell me that: The content in an iframe is not considered part of the parent page. The page within an iframe may be spidered and indexed (or it may be not) but no PR is definitely passed. But these are the content in the iframe, what about the parent page? Does the PageRank of the parent page will decrease because the iframe? Or maybe Googlebot wouldn't crawl the parent page? Or is the parent page not affected at all?

Read the article
How to protect SHTML pages from crawlers/spiders/scrapers?

- by Adam Lynch

I have A LOT of SHTML pages I want to protect from crawlers, spiders & scrapers. I understand the limitations of SSIs. An implementation of the following can be suggested in conjunction with any technology/technologies you wish: The idea is that if you request too many pages too fast you're added to a blacklist for 24 hrs and shown a captcha instead of content, upon every page you request. If you enter the captcha correctly you've removed from the blacklist. There is a whitelist so GoogleBot, etc. will never get blocked. Which is the best/easiest way to implement this idea? Server = IIS Cleaning out the old tuples from a DB every 24 hrs is easily done so no need to explain that.

Read the article
How can I allow robots access to my sitemap, but prevent casual users from accessing it?

- by morpheous

I am storing my sitemaps in my web folder. I want web crawlers (Googlebot etc) to be able to access the file, but I dont necessarily want all and sundry to have access to it. For example, this site (superuser.com), has a site index - as specified by its robots.txt file (http://superuser.com/robots.txt). However, when you type http://superuser.com/sitemap.xml, you are directed to a 404 page. How can I implement the same thing on my website? I am running a LAMP website, also I am using a sitemap index file (so I have multiple site maps for the site). I would like to use the same mechanism to make them unavailable via a browser, as described above.

Read the article
Does a "nofollow" attribute on a link prevent URL discovery by search engines?

- by Stephen Ostermiller

I know that nofollow prevents link juice from being passed across a link. But if search engine robots discover a link with a nofollow on it, will they add that link to their crawl queue? In other words, if I create a link to a brand new page and put a rel=nofollow attribute on that link, will it prevent search engine bots (particularly Googlebot) from crawling the page. (Assuming that this link remains the only link into that page.) I've read conflicting reports about this over the years and I'm looking for authoritative references about the current state of affairs. Official statements from Google or published results of independent testing would be ideal.

Read the article
How to remove old robots.txt from google as old file block the whole site

- by KnowledgeSeeker

I have a website which still shows old robots.txt in the google webmaster tools. User-agent: * Disallow: / Which is blocking Googlebot. I have removed old file updated new robots.txt file with almost full access & uploaded it yesterday but it is still showing me the old version of robots.txt Latest updated copy contents are below User-agent: * Disallow: /flipbook/ Disallow: /SliderImage/ Disallow: /UserControls/ Disallow: /Scripts/ Disallow: /PDF/ Disallow: /dropdown/ I submitted request to remove this file using Google webmaster tools but my request was denied I would appreciate if someone can tell me how i can clear it from the google cache and make google read the latest version of robots.txt file.

Read the article
Google is not indexing my entire site despite having a sitemap

- by Anusha

I have an e-commerce website www.beyondtime.in. I have been constantly monitoring Googlebot crawling on my website and my webmaster account. Lately, I have found two issues that I have not been able to understand. 1.) The Google Bots have been only crawling www.beyondtime.in/telecom.php when the URL is not even valid. What needs to be done to let Google crawl other pages of the website as well? 2.) The second question is about the Google Webmaster account, where I've submitted my sitemap with 227 URLs. Out of that, only 156 have been indexed. None of the images of my website have been indexed by Google.

Read the article
Does Bing support anything like Google's First Click Free program?

- by Dan Fabulich

Google has a program for webmasters called First Click Free. To implement First Click Free, you need to allow all users who find a document on your site via Google search to see the full text of that document, even if they have not registered or subscribed to see that content. The user's first click to your content area is free. However, once that user clicks a link on the original page, you can require them to sign in or register to read further. The user must be able to see the full content of a multi-page article. You can allow this by displaying all content on a single page to both Googlebot and users. Alternatively, you can use cookies to make sure that a user can visit each page of a multi-page article before being asked for registration or payment. Does Bing support anything like this?

Read the article
How to secure robots.txt file?

- by CompilingCyborg

I would like for User-agents to index my relative pages only without accessing any directory on my server. As initial thought, i had this version in mind: User-agent: * Disallow: */* Sitemap: http://www.mydomain.com/sitemap.xml My Questions: Is it correct to block all directories like that - Disallow: */*? Would still search engines be able to see and index my sitemap if i disallowed all directories? What are the best practices for securing the robots.txt file? For Reference: Here is a good tutorial for robots.txt #Add this if you want to stop Alexa from indexing your site. User-agent: ia_archiver Disallow: / #Add this to stop duggmirror User-agent: duggmirror Disallow: / #Add this to allow specific agents User-agent: Googlebot Disallow: #Add this to allow all agents while blocking specific directories User-agent: * Disallow: /cgi-bin/ Disallow: /*?*

Read the article
Using disqus for a website (question on SERP and backlink)

- by Homunculus Reticulli

I am building a website and am trying to decide between writing my own commenting system and using disquss. One of the main deciding factors is that (obviously), I want comments left on my page to be show on SERPS. However, I remeber reading somewhere that disqus loads comments into a page using AJAX - and therefore the comments are "invisible" as far as Googlebot and other SE crawlers are concerned. Could someone confirm or refute this? The other question I have is about whether (as a commenter), When I place a comment on another website using disqus (including any links I may add to my comment), do the lionks in my comment count as a backlink (in other words are they "dofollow" or "nofollow" links)?

Read the article
How URL Redirection affects SEO?

- by Costa

The following paragraph is from SEO Google Guide Google is good at crawling all types of URL structures, even if they're quite complex, but spending the time to make your URLs as simple as possible for both users and search engines can help. Some webmasters try to achieve this by rewriting their dynamic URLs to static ones; while Google is fine with this, we'd like to note that this is an advanced procedure and if done incorrectly, could cause crawling issues with your site. What makes URL re-writing implementation incorrect for GoogleBot? I am using Asp.net 3.5 framework. Thanks

Read the article
Redirect error in Google Webmaster Tools report

- by Aurelio De Rosa

I built a CMS and I used it to create the following website http://www.tkdmontecatini.com . After some days, Google Webmaster Tools started to give me several "Redirect error" on some pages like the follows: http://www.tkdmontecatini.com/it/photogallery http://www.tkdmontecatini.com/it/pagina/9/Informazioni/Corsi/Chi-Siamo http://www.tkdmontecatini.com/it/pagina/2/Informazioni/Eventi/Eventi The funny things are: If I access those links from a browser, it's all right and I've not redirect loops or other similar issues If I use the "Fetch as Googlebot" function, I get a great "Success" result Question: Any idea of why this happens and how can I fix it?

Read the article
How does URL Rewriting affect SEO?

- by Costa

The following paragraph is from SEO Google Guide Google is good at crawling all types of URL structures, even if they're quite complex, but spending the time to make your URLs as simple as possible for both users and search engines can help. Some webmasters try to achieve this by rewriting their dynamic URLs to static ones; while Google is fine with this, we'd like to note that this is an advanced procedure and if done incorrectly, could cause crawling issues with your site. What makes URL re-writing implementation incorrect for GoogleBot? I am using Asp.net 3.5 framework.

Read the article
Can a client dictate whether or not HttpContext is created?

- by Keivan

We are getting a lot of hits from Googlebot and BingBot and it appears that none of these requests have an HttpContext. I originally thought that every http request will get a context which obviously is not the case so I'm trying to understand how does an HttpContext gets constructed, is it part of the negotiation between client and server?

Read the article
Can I tell sitecrawlers to visit a certain page?

- by Ace

Hi there! I have this drupal website that revolves around a document database. By design you can only find these documents by searching the site. But I want all the results to be indexed by Googlebot and other crawlers, so I was thinking, what if I make a page that lists all the documents, and then tell the robots to visit the page to index all my documents..? Is this possible, or is there a better way to do it?

Read the article

< Previous Page | 1 2 3 4 5 6 7 | Next Page >