Search Results

Search found 4783 results on 192 pages for 'txt'.

Page 6/192 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13  | Next Page >

  • 600 visitors per day, 20 backlinks but still not referenced by Google

    - by Tristan
    Hello, i've launch a website on wednesday (9 August) There's already in 4 days 12,000 viewed pages 1,400 visits 20 to 25 backlincks a sitemap.xml (130 URL) english language / french language - url like that "/en/" "/fr/" BUT, i'm still not referrenced by google In the google webmaster i have 0 backlinks 130 in sitemap, 0 URL referrenced on google For a smaller website, i remember that i took me less to appear on google with less visits. My url is: http://www.seek-team.com/en/ for english and juste replace /en/ by /fr/ to access it in french. What's causing this ? Is there an explanation ? Thanks for your help (ps, i've already checked robots.txt)

    Read the article

  • Hiding php includes from search spiders?

    - by 21stcn
    Quick and simple question. I have 80+ html files which I want to be crawled. They are individual product pages. Each of these pages calls its content using php includes. These php include files are in a separate folder on the server and contain the core content for the individual product pages. I just wanted to ask, if I use robots.txt or .htaccess to prevent crawling of the directory that holds the php content files, will there be no issue crawling the html pages which include these files? What I want to achieve is have the html files indexed with the php content included in them, but I don't want visitors landing on the php content pages, nor have these php files indexed as duplicate content. Just clarification needed as to whether it is safe to block spiders from accessing the php folder, without this affecting the html files being indexed with the included content. Is this the best way to do things? Or should I just leave the content php files to be crawled?

    Read the article

  • Best way to prevent Google from indexing a directory [duplicate]

    - by Gkhan14
    This question already has an answer here: Stopping Google index some web pages I have 5 answers I've researched many methods on how to prevent Google/other search engines from crawling a specific directory. The two most popular ones I've seen are: Adding it into the robots.txt file: Disallow: /directory/ Adding a meta tag: <meta name="robots" content="noindex, nofollow"> Which method would work the best? I want this directory to remain "invisible" from search engines so it does not affect any of my site's ranking. In other words, I want this directory to be neutral/invisible and "just there." I don't want it to affect any ranking. Which method would be the best to achieve this?

    Read the article

  • Stop bots from crawling old links with extensions

    - by Jared
    I've recently switched to MVC3 which is extension-less for the URL's, but Google and Bing have a wealth of links that they are crawling which no longer exist. So I'm trying to find out if there is a way to format robots.txt (or by some other method) to tell google/bing that any link that ends in an extension isn't a valid link... Is this possible? On pages that I'm concerned about a User having saved as a fav I'm displaying a 404 page that lists the links to take once they are redirected to the new page (I decided to not just redirect them as I don't want to maintain these forever). For Google/Bing sake I do have the canonical tag in the header. User-agent: * Allow: / Disallow: /*.* EDIT: I just added the 3rd line (in text above) and it APPEARS to do what I'm wanting. Allow a path, but disallow a file. Can anyone confirm this?

    Read the article

  • How to block FeedReader from fetching my content to their site?

    - by Wei Kai
    As you all can see from the picture below, my site's content is duplicated by FeedReader and indexed at Google. When I clicked at the FeedReader link, it uses some sort of iFrame to draw content from my site live. This forms some sort of content duplication to me, and I believe it does harm to my site. (Stackoverflow doesn't allow me to post image due to new account, pls click at the link above to see the picture, million thanks to you.) What can I do to prevent Feedreader to fetch my content to their site? I know robots.txt can perform such function, but I don't know how to do it. Any help would be much appreciated. I have also highlighted this issue to FeedReader 2 days ago, but yet to get any reply from them.

    Read the article

  • Clicks counting and crawler bots

    - by Dennis
    I am currently running a small affiliate-program for Facebook users. We use an auto-poster to publish links to fan pages. Every hit is stored in our database and we have included a 24 hour reload block for the IP-addresses. My problem right now is that the PHP script also stores every hit from all the bots that crawls my website. Now I was thinking to block those bots with the robots.txt of my website but I am afraid that this will have a negative effect on my AdSense ads. Does anybody have an idea for me how to work this out?

    Read the article

  • Bad Bot blocking Revisited

    - by Tom
    I've read a lot about bad bot blocking, php scripts, .htaccess techniques, etc... Is this a valid method? Since .htacces can rewrite and send a bad bot a 403 deny or forward to something like spam poison, is it possible to Disallow a folder, then through .htaccess in that specific folder redirect to spampoison? Since Apache reads each .htaccess independently and follows specific instructions, then a bad bot not following robots.txt would just be redirected. Or anyone trying to access, /badbot/ or whatever I choose to call my trap folder. Thanks Tom

    Read the article

  • How do I deal with content scrapers? [closed]

    - by aem
    Possible Duplicate: How to protect SHTML pages from crawlers/spiders/scrapers? My Heroku (Bamboo) app has been getting a bunch of hits from a scraper identifying itself as GSLFBot. Googling for that name produces various results of people who've concluded that it doesn't respect robots.txt (eg, http://www.0sw.com/archives/96). I'm considering updating my app to have a list of banned user-agents, and serving all requests from those user-agents a 400 or similar and adding GSLFBot to that list. Is that an effective technique, and if not what should I do instead? (As a side note, it seems weird to have an abusive scraper with a distinctive user-agent.)

    Read the article

  • How to disallow indexing but allow crawling?

    - by John Doe
    In the front page of my website, I have some previews to articles (with a small introduction to them) that link to the full articles. I want to disallow the front page to prevent duplicate content. But if I do this (in robots.txt), would it still be crawled? I mean, the full articles would be still reached by the crawler even though I disallowed the only page that links to them? I don't want the webcrawler not to access the page and enter the links in them, but I just don't want it to save the information (that will be repeated in the full articles).

    Read the article

  • CDN virtual subdomain causes duplicated content

    - by user3474818
    I have created a subdomain and a CNAME record which points to the domain root. The subdomain www.static.example.com is actually a copy of the entire website www.example.com and it is supposed to act as an CDN and serve static content in order to improve speed. However, all of my content can be accessed via subdomain aswell, so Google has indexed it all and now I am dealing with duplicated content. How could I deny access to crawlers for the subdomain baring in mind that I do not have different subfolder for subdomain, so I can't create a separate robots.txt file?

    Read the article

  • SEO: disallowing Google from indexing forms in iframes or not?

    - by Marco Demaio
    I usually place forms in iframes (i.e. order form, request assistance form, contact forms, ect.). Just the forms, I never place other contents or pages in iframes. From a SEO point of view, would you exclude forms from being indexed/crawled by Google or not? I mean my forms hardly ever contains keyword/keyphrases, moreover I obviously place empty title/meta description tags in pages shown in iframe to display forms, cause those titles are never displaied in browser title bar. So I'm wondering what's the point of letting Google index them? Moreover I think these form pages might suck out PR from all other pages that are more valuable for SEO. If your answer is "yes I would exclude them form indexing" would you simply use robots.txt to exclude them? Thanks!

    Read the article

  • Can I include a robots meta tag outside of the head in HTML snippets indeded to be SSIed?

    - by Dan
    I have a number of files in my site which are not intended for independent viewing, but rather to be AJAXed into content within the site. They obviously don't meet HTML standards (no body, head, etc.) as independent entities. I would like to prevent search engines from indexing these pages, but do not have access to /robots.txt (which would be much more ideal). My question is, could I include the following at the top of these partial HTML files and get the desired results? <meta name="robots" content="noindex, noarchive"> I guess there are two parts to this question. Will this cause any rendering issues in any browsers? Will search engines (at least Google & Bing) interpret this as intended?

    Read the article

  • Googlebot can't access my site when crawling from rootdomain

    - by PéCé
    I can't explain why I get this message for my rootdomain result in Google : trocmalin.com/ A description for this result is not available because of this site's robots.txt – learn more. Here is my site specifics : vide-greniers.trocmalin.com is the site address www.trocmalin.com redirects (301) to vide-greniers.trocmalin.com trocmalin.com redirects (301) to vide-greniers.trocmalin.com too... User-agent: * Disallow: /orga/ User-agent: * Disallow: /sitemap-update Google results for vide-greniers.trocmalin.com are well rendered, as well as sub pages allowed for bots. But the result for my rootdomain (trocmalin.com) gives this message... Can you help me ?

    Read the article

  • Will duplicate international (i18n) content hinder SEO rankings?

    - by Rhys
    Google clearly states that duplicate content within a single, or multiple, domains is not advised. This is understood, but I am not sure of any exceptions for sites with region-specific content that is often replicated across locales. For example, a site's /en-us/about page could be identical to /en-uk/about, whereas most likely /en-ja/about is unique. Are GYM smart enough to understand that the initial URL depth is a locale specifier? Is there any robots.txt or header, etc, trickery that I should include to outline the site's international structure?

    Read the article

  • Htaccess/robots.txt to allow search bots to explore main domain but not directory on other domain

    - by gX
    Ok, I understand the Title didn't make any sense so here I've tried to explain it in detail. I'm using a hosting that gives me space for my domain and lets me "add on" other domains on it. So lets say I have a domain A, and I add on a domain B. Basically my hosting gives me a public_html where I can put stuff that shows when someone visits website A. But, when I add the domain B, it lets me put the content of B, INSIDE of that public_html so that website B.com can also be visited by going to A.com/siteB... Thats all good, except that Google has started indexing B.com as well as A.com/siteB, I'm ok with it indexing B.com, but I somehow want to prevent it from indexing A.com/siteB so that when people search for B, it doesn't end up showing A.com/siteB. Any ideas? Let me know if the question is still unclear.

    Read the article

  • Need to sanity-check my .htaccess, especially Limit GET POST line for Google repellent

    - by jose
    I need a sanity check on this .htaccess (from a WordPress site) I inherited from a 5 month+ old site. What's the symptom? Google + Bing crawl, but don't index any of the pages. Let me be clear: I'm not mad about "not ranking high." I think something is (accidentally) rejecting search engine indexing. I am not an expert on .htaccess, but one part especially looked funny, the Limit GET POST line. Is it not weird to have both Allow and Deny all, with no parameters? Also, I've ruled out robots.txt, but if I were you I'd want to see it, so here it is: User-agent: * Crawl-delay: 30 And here's the more suspect .htaccess: # temp redirect wordpress content feeds to feedburner <IfModule mod_rewrite.c> RewriteEngine on RewriteCond %{HTTP_USER_AGENT} !FeedBurner [NC] RewriteCond %{HTTP_USER_AGENT} !FeedValidator [NC] RewriteRule ^feed/?([_0-9a-z-]+)?/?$ http://feeds.feedburner.com/anonymousblog [R=302,NC,L] </IfModule> # temp redirect wordpress comment feeds to feedburner <IfModule mod_rewrite.c> RewriteEngine on RewriteCond %{HTTP_USER_AGENT} !FeedBurner [NC] RewriteCond %{HTTP_USER_AGENT} !FeedValidator [NC] RewriteRule ^comments/feed/?([_0-9a-z-]+)?/?$ http://feeds.feedburner.com/anonymous_comments [R=302,NC,L] </IfModule> <IfModule mod_rewrite.c> RewriteEngine On RewriteBase / RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] </IfModule> IndexIgnore .htaccess */.??* *~ *# */HEADER* */README* */_vti* <Limit GET POST> order deny,allow deny from all allow from all </Limit> <Limit PUT DELETE> order deny,allow deny from all </Limit> php_value memory_limit 32M Adding header by request: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <meta name="robots" content="noindex,nofollow" /> <meta name="description" content="buncha junk i've deleted." /> <meta name="keywords" content="keywords i've deleted" /> <meta name="viewport" content="width=device-width" />

    Read the article

  • Handling SEO for Infinite pages that cause external slow API calls

    - by Noam
    I have an 'infinite' amount of pages in my site which rely on an external API. Generating each page takes time (1 minute). Links in the site point to such pages, and when a users clicks them they are generated and he waits. Considering I cannot pre-create them all, I am trying to figure out the best SEO approach to handle these pages. Options: Create really simple pages for the web spiders and only real users will fetch the data and generate the page. A little bit 'afraid' google will see this as low quality content, which might also feel duplicated. Put them under a directory in my site (e.g. /non-generated/) and put a disallow in robots.txt. Problem here is I don't want users to have to deal with a different URL when wanting to share this page or make sense of it. Thought about maybe redirecting real users from this URL back to the regular hierarchy and that way 'fooling' google not to get to them. Again not sure he will like me for that. Letting him crawl these pages. Main problem is I can't control to rate of the API calls and also my site seems slower than it should from a spider's perspective (if he only crawled the generated pages, he'd think it's much faster). Which approach would you suggest?

    Read the article

  • What kind of redirect (301 or 302) for an email links tracker?

    - by MaxiWheat
    We are developing an email sending application ("à la" Mailchimp). Hyperlinks inserted by our users, in the emails they want to send, are replaced by a tracking URL on our application (https://ourdomain.com/trackingurl?blablabla) which then redirects the email reader to the original URL our users included in their emails. This allows us to record statistics about link clicks. Until now, we used 301 for those redirections, but we noticed that Google began indexing pages on our application which are in fact redirects to other domains. (The title and snippet in Google results are from the other domain, but the link in green is from our application). We took action by adding those urls to our robots.txt, but Google seems to take forever (months!) before removing them for its index and removing them by hand in Webmaster Tools would take a lot of time since there are lot. I would like to know which kind of HTTP redirect (301 or 302) is best suited for this kind of opreation ? Do you think switching to 302 redirects could improve this situation since we don't really want Google to index redirected links from our clients emails ?

    Read the article

  • Where to put the SPF TXT record?

    - by YellowSquirrel
    I've set up Google apps for my domain: I've registered the domain with Google by adding the CNAME Google asked and I've apparently succesfully setup the MX Google mail servers. So far I haven't yet a dedicated server: I'm just having a domain at a registrar. Now I want to activate SPF and I'm confused. In the following short webpage: http://www.google.com/support/a/bin/answer.py?answer=178723 it is written that I must add a TXT record containing: v=spf1 include:_spf.google.com ~all Where should I enter this? Should this go in the zone (?) file, like I did for the CNAME and the MX records? So far I have something like this: @ 10800 IN A 217.42.42.42 @ 10800 IN MX 5 ASPMX3.GOOGLEMAIL.COM. @ 10800 IN MX 5 ASPMX2.GOOGLEMAIL.COM. @ 10800 IN MX 3 ALT2.ASPMX.L.GOOGLE.COM. @ 10800 IN MX 3 ALT1.ASPMX.L.GOOGLE.COM. @ 10800 IN MX 1 ASPMX.L.GOOGLE.COM. google8a70835987f31e34 10800 IN CNAME google.com. Does adding the SPF TXT record mean I should literally have something like that: @ 10800 IN A 217.42.42.42 @ 10800 IN MX 5 ASPMX3.GOOGLEMAIL.COM. @ 10800 IN MX 5 ASPMX2.GOOGLEMAIL.COM. @ 3600 IN TXT "v=spf1 include:_spf.google.com ~all" @ 10800 IN MX 3 ALT2.ASPMX.L.GOOGLE.COM. @ 10800 IN MX 3 ALT1.ASPMX.L.GOOGLE.COM. @ 10800 IN MX 1 ASPMX.L.GOOGLE.COM. google8a70835987f31e34 10800 IN CNAME google.com. I made that one up and included right in the middle to show how confused I am. What I'd like to know is the exact syntax and where/how I should put this TXT record.

    Read the article

  • Loading levels from .txt or .XML for XNA

    - by Dave Voyles
    I'm attemptin to add multiple levels to my pong game. I'd like to simply exchange a few elements with each level, nothing crazy. Just the background texture, the color of the AI paddle (the one on the right side), and the music. It seems that the best way to go about this is by utilizing the StreamReader to read and write the files from XML. If there is a better, or more efficient alternative way then I'm all for it. In looking over the XNA Starter Platformer Kit provided by MS it seems that they've done it in this manner as well. I'm perplexed by a few things, however, namely parts within the Level class which aren't commented. /// <summary> /// Iterates over every tile in the structure file and loads its /// appearance and behavior. This method also validates that the /// file is well-formed with a player start point, exit, etc. /// </summary> /// <param name="fileStream"> /// A stream containing the tile data. /// </param> private void LoadTiles(Stream fileStream) { // Load the level and ensure all of the lines are the same length. int width; List<string> lines = new List<string>(); using (StreamReader reader = new StreamReader(fileStream)) { string line = reader.ReadLine(); width = line.Length; while (line != null) { lines.Add(line); if (line.Length != width) throw new Exception(String.Format("The length of line {0} is different from all preceeding lines.", lines.Count)); line = reader.ReadLine(); } } What does width = line.Length mean exactly? I mean I know how it reads the line, but what difference does it make if one line is longer than any of the others? Finally, their levels are simply text files that look like this: .................... .................... .................... .................... .................... .................... .................... .........GGG........ .........###........ .................... ....GGG.......GGG... ....###.......###... .................... .1................X. #################### It can't be that easy..... Can it?

    Read the article

  • mod evasive not working properly on ubuntu 10.04

    - by Joe Hopfgartner
    I have an ubuntu 10.04 server where I installed mod_evasive using apt-get install libapache2-mod-evasive I already tried several configurations, the result stays the same. The blocking does work, but randomly. I tried with low limis and long blocking periods as well as short limits. The behaviour I expect is that I can request websites until either page or site limit is reached per given interval. After that I expect to be blocked until I did not make another request for as long as the block period. However the behaviour is that I can request sites and after a while I get random 403 blocks, which increase and decrase in percentage, however they are very scattered. This is an output of siege, so you get an idea: HTTP/1.1 200 0.09 secs: 75 bytes ==> /robots.txt HTTP/1.1 403 0.08 secs: 242 bytes ==> /robots.txt HTTP/1.1 200 0.08 secs: 75 bytes ==> /robots.txt HTTP/1.1 403 0.08 secs: 242 bytes ==> /robots.txt HTTP/1.1 200 0.11 secs: 75 bytes ==> /robots.txt HTTP/1.1 403 0.08 secs: 242 bytes ==> /robots.txt HTTP/1.1 200 0.08 secs: 75 bytes ==> /robots.txt HTTP/1.1 403 0.09 secs: 242 bytes ==> /robots.txt HTTP/1.1 200 0.08 secs: 75 bytes ==> /robots.txt HTTP/1.1 200 0.09 secs: 75 bytes ==> /robots.txt HTTP/1.1 200 0.08 secs: 75 bytes ==> /robots.txt HTTP/1.1 200 0.09 secs: 75 bytes ==> /robots.txt HTTP/1.1 403 0.08 secs: 242 bytes ==> /robots.txt HTTP/1.1 200 0.08 secs: 75 bytes ==> /robots.txt HTTP/1.1 403 0.08 secs: 242 bytes ==> /robots.txt HTTP/1.1 200 0.10 secs: 75 bytes ==> /robots.txt HTTP/1.1 403 0.08 secs: 242 bytes ==> /robots.txt HTTP/1.1 200 0.08 secs: 75 bytes ==> /robots.txt HTTP/1.1 403 0.09 secs: 242 bytes ==> /robots.txt HTTP/1.1 200 0.10 secs: 75 bytes ==> /robots.txt HTTP/1.1 403 0.09 secs: 242 bytes ==> /robots.txt HTTP/1.1 200 0.09 secs: 75 bytes ==> /robots.txt HTTP/1.1 200 0.08 secs: 75 bytes ==> /robots.txt HTTP/1.1 200 0.09 secs: 75 bytes ==> /robots.txt HTTP/1.1 200 0.08 secs: 75 bytes ==> /robots.txt HTTP/1.1 200 0.10 secs: 75 bytes ==> /robots.txt HTTP/1.1 200 0.08 secs: 75 bytes ==> /robots.txt The exac limits in place during this test run were: DOSHashTableSize 3097 DOSPageCount 10 DOSSiteCount 100 DOSPageInterval 10 DOSSiteInterval 10 DOSBlockingPeriod 120 DOSLogDir /var/log/mod_evasive DOSEmailNotify ***@gmail.com DOSWhitelist 127.0.0.1 So I would expect to be blocked at least 120 seconds after being blocked once. Any ideas aobut this? I also tried adding my configuration at different places (vhost, server config, directory context) and with of without ifmodule directive... This doesnt change anything.

    Read the article

  • Multiple robots.txt for subdomains in rails

    - by Christopher
    I have a site with multiple subdomains and I want the named subdomains robots.txt to be different from the www one. I tried to use .htaccess, but the FastCGI doesn't look at it. So, I was trying to set up routes, but it doesn't seem that you can't do a direct rewrite since every routes needs a controller: map.connect '/robots.txt', :controller => ?, :path => '/robots.www.txt', :conditions => { :subdomain => 'www' } map.connect '/robots.txt', :controller => ?, :path => '/robots.club.txt' What would be the best way to approach this problem? (I am using the request_routing plugin for subdomains)

    Read the article

  • Write data into .txt file created by CFileDialog, in C++

    - by younevertell
    I wanna Write data into .txt file created by CFileDialog, in C++. The problem I am facing is that below codes doesn't work, although there is no build error. The .txt file created by CFileDialog can not be found for some reason. What's wrong the code? what's the efficient way to Write data into .txt file created by CFileDialog, in C++? Thanks CFileDialog dlg(FALSE, NULL, NULL, OFN_OVERWRITEPROMPT, _T("My Data File (*.txt)|*.txt||")); if(dlg.DoModal() != IDOK) return; CString filename = dlg.GetPathName(); ofstream outfile (filename); int mydata = 10; outfile << "my data:" << mydata << endl; outfile.close();

    Read the article

  • Convert file for XLS to TXT format

    - by meenakshi
    I renamed a .xls file to .txt file but the .txt file shows like this: öRn÷åËþEõZ;zûÕ÷IÉD;Üÿêá¾ú‡×_*4u\çBq&?!€¨(ˆô‰/t/ª§‹'Oý?WïçþJ®½ãÁÁ|DiæEÇ’IŠ›ðä/\$r'Ù¡îê?|ÕÝꮲ¸ó—QG¦çåΊ/–×ÒÏpQP~Q|å?‘ÈpÖ‘»ëËß62 Š/zaqçÎÖ©ú•ênÆ›WÚ·«àÙ}S«ˆº/~Ø×:U”¥‡7îTísvçùòl!ý0ýá»êéqx[T¯ß¿ø1 ¼ð…PïXáÃñ}ý«Å'gK)á*ÐÒm¦ŽØ™¯Püä©ïªÇË‹—‹U¡Jq5ÿ‚Šÿò6qúߟX½ûuqi&×6êÞç~°ðÄÜ‹Òý¥Ÿø…cŸEão(ýNÖáÛ«îâW·‰ôîÏn¥zÚ¨—ÞŸqwø6XuÁËóñ¤;¢òhz¾&®û\â:çýÎxÒL\ïùJn2ì4ÓÖ¾‚{9˜Œ›‰<Ÿ¸qgØ0ä†Ï'®Ûë5¹Ñó‰ë›ò0~FšÜäù How can I get the same data as in an .xls file?

    Read the article

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13  | Next Page >