Search Results

Search found 4783 results on 192 pages for 'txt'.

Page 3/192 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • robots.txt file with more restrictive rules for certain user agents

    - by Carson63000
    Hi, I'm a bit vague on the precise syntax of robots.txt, but what I'm trying to achieve is: Tell all user agents not to crawl certain pages Tell certain user agents not to crawl anything (basically, some pages with enormous amounts of data should never be crawled; and some voracious but useless search engines, e.g. Cuil, should never crawl anything) If I do something like this: User-agent: * Disallow: /path/page1.aspx Disallow: /path/page2.aspx Disallow: /path/page3.aspx User-agent: twiceler Disallow: / ..will it flow through as expected, with all user agents matching the first rule and skipping page1, page2 and page3; and twiceler matching the second rule and skipping everything?

    Read the article

  • Is it safe to block redirected (but still linked) URLs with robots.txt?

    - by Edgar Quintero
    I have a website that has all URLs optimized and 301 redirected from nasty URLs to clean ones. However, everywhere throughout the site the unclean URLs are linked in menus, content, products, etc. Google currently has all clean URLs indexed, along with a few unclean URLs too. So the site still has linked everywhere the old URLs (ideally this wouldn't be the case but this is how it is ATM). I would like to block the unclean URLs with robots.txt. The question: if I block these unclean URLs with the robots.txt, when the entire website is linked with them (but they all redirect to the clean version), will this affect the indexing status at all?

    Read the article

  • Which token from a long User-Agent should I use in robots.txt?

    - by Gaia
    The definition of User-Agent states that several tokens can be included, as deemed necessary by the client. I want to block certain bots via robots.txt and I am confused as to which part of the User-Agent string to use, especially for more obscure bots. For example: Mozilla/5.0 (compatible; uMBot-LN/1.0; mailto: [email protected])" JS-Kit URL Resolver, http://js-kit.com/ Mozilla/5.0 (compatible; SEOkicks-Robot +http://www.seokicks.de/robot.html Do I use the second token? Can tokens contain spaces, or did the SEOkicks folks forget a semicolon after SEOkicks-Robot? I don't actually intend on making my question specific to a couple bots - I want to know the guideline: which part of UA do I place in robots.txt for these exotic bots with UA as long as a haiku? User-agent: uMBot-LN/1.0 Disallow: / PS: Thank you but I do not need to hear that undesirable bots are better blocked with mod_security. I already have commercial mod_sec rules in place.

    Read the article

  • Is it safe to Block These URLs with Robots.txt?

    - by Edgar Quintero
    I have a website that has all URLs optimized and 301 redirected from nasty URLs to clean ones. However, everywhere throughout the site the unclean URLs are linked in menus, content, products, etc. Google currently has all clean URLs indexed, along with a few unclean URLs too. So the site still has linked everywhere the old URLs (ideally this wouldn't be the case but this is how it is ATM). I would like to block the unclean URLs with robots.txt. The question: If I block these unclean URLs with the robots.txt, when the entire website is linked with them (but they all redirect to the clean version), will this affect the indexing status at all?

    Read the article

  • Google Sitemap and Robots.txt Issue

    - by Sarfaraz Soomro
    Hi, We have a sitemap at our site, http://www.gamezebo.com/sitemap.xml Some of the urls in the sitemap, are being reported in the webmaster central as being blocked by our robots.txt, see, gamezebo.com/robots.txt ! Although these urls are not Disallowed in Robots.txt. There are other such urls aswell, for example, gamezebo.com/gamelinks is present in our sitemap, but it's being reported as "URL restricted by robots.txt". Also I have this parse result in the Webmaster Central that says, "Line 21: Crawl-delay: 10 Rule ignored by Googlebot". What does it mean? I appreciate your help, Thanks.

    Read the article

  • read out a txt and send the last line to email adress - vbscript

    - by matthias
    Hallo, I'm back haha :-) so i have the next question and i hope someone can help me... I know i have a lot of questions but i will try to learn vbscript :-) Situation: I try to make a program that check all 5 min a txt and if there a new line in the txt, i'll try to send it to my eMail Address Option Explicit Dim fso, WshShell, Text, Last, objEmail Const folder = "C:\test.txt" Set fso=CreateObject("Scripting.FileSystemObject") Set WshShell = WScript.CreateObject("WScript.Shell") Do Text = Split(fso.OpenTextFile(Datei, 1).ReadAll, vbCrLF) Letzte = Text(UBound(Text)) Set objEmail = CreateObject("CDO.Message") objEmail.From = "[email protected]" objEmail.To = "[email protected]" objEmail.Subject = "Control" objEmail.Textbody = Last objEmail.Configuration.Fields.Item _ ("http://schemas.microsoft.com/cdo/configuration/sendusing") = 2 objEmail.Configuration.Fields.Item _ ("http://schemas.microsoft.com/cdo/configuration/smtpserver") = _ "smtpip" objEmail.Configuration.Fields.Item _ ("http://schemas.microsoft.com/cdo/configuration/smtpserverport") = 25 objEmail.Configuration.Fields.Update objEmail.Send WScript.Sleep 300000 Loop This program works, but this program send me all 5 mins a mail...but i will only have a new mail when there is a new line in the txt. Can someone help me?

    Read the article

  • Should I add a "nofollow" attribute to download links, or disallow the URLs in robots.txt?

    - by Laurent
    I have a download link very similar to Opera's one - it's just a script that sends the file. It doesn't have an extension and there's no obvious way to tell that it's actually a download link. So since I don't want robots to crawl this link, do I need to add it to robots.txt or maybe add a "nofollow" attribute to it? I see that on Opera's website they didn't do either of this, so perhaps it's not necessary?

    Read the article

  • Should I disallow(robots.txt) archive/author pages with links already available on the front page? [on hold]

    - by WPRookie82
    I am working on a simple Wordpress blog where when an article is published, it appears on ALL these pages: Homepage - Headline(clickable) + 3-line summary Parent category page - Headline(clickable) + 3-line summary Child category page - Headline(clickable) + 3-line summary Author page - Headline(clickable) sitemap.xml I've been told that I should add all author pages to my robots.txt, under disallow, so as search engine bots do not spider /author/* since all links on these pages are available elsewhere. Is this a good approach or maybe rel=nofollow is better, or maybe I shouldn't worry about this at all?

    Read the article

  • wget not respecting my robots.txt. Is there an interceptor?

    - by Jane Wilkie
    I have a website where I post csv files as a free service. Recently I have noticed that wget and libwww have been scraping pretty hard and I was wondering how to circumvent that even if only a little. I have implemented a robots.txt policy. I posted it below.. User-agent: wget Disallow: / User-agent: libwww Disallow: / User-agent: * Disallow: / Issuing a wget from my totally independent ubuntu box shows that wget against my server just doesn't seem to work like so.... http://myserver.com/file.csv Anyway I don't mind people just grabbing the info, I just want to implement some sort of flood control, like a wrapper or an interceptor. Does anyone have a thought about this or could point me in the direction of a resource. I realize that it might not even be possible. Just after some ideas. Janie

    Read the article

  • robots.txt, how effective is it and how long does it take?

    - by Stefan
    We recently updated the site to a single page site using jQuery to slide between "pages". So we now have only index.php. When you search the company on engines such as Google, you get the site and a listing of its sub pages which now lead to outdated pages. Our plan doesn't allow us to edit the .htaccess and the old pages are .html docs so I cannot use PHP redirects either. So if I put in place a robots.txt telling the engines to not crawl beyond index.php, how effective will this be in preventing/removing crawled sub pages. And rough guess, how long before the search engines would update?

    Read the article

  • Is there any advantage/disadvantage to using robots.txt to disallow access to legal pages such as terms, privacy policy, etc.?

    - by CaptainCodeman
    As I understand, having repetitive content is a detriment to search engine placement. Given that many websites that use similar or even identical "Terms and Conditions" and "Privacy Policy" pages due to similar legal wording or due to copy & pasting from the same source, would it be a good idea to disallow access to these pages via robots.txt, in order to avoid being penalized for "non-original content"? Or, on the contrary, could the search engines identify this as circumvention and penalize the site for trying to hide content? Or does it not matter?

    Read the article

  • TXT File or Database?

    - by Ruth Rettigo
    Hey folks! What should I use in this case (Apache + PHP)? Database or just a TXT file? My priority #1 is speed. Operations Adding new items Reading items Max. 1 000 records Thank you. Database (MySQL) +----------+-----+ | Name | Age | +----------+-----+ | Joshua | 32 | | Thomas | 21 | | James | 34 | | Daniel | 12 | +----------+-----+ TXT file Joshua 32 Thomas 21 James 34 Daniel 12

    Read the article

  • Robots.txt syntax

    - by Sinan
    I not expert on robots.txt and i have the following in one of my clients robots.txt User-agent: * Disallow: Disallow: /backup/ Disallow: /stylesheets/ Disallow: /admin/ I am not sure about the second line. Is this line disallows all spiders?

    Read the article

  • Multiple SiteMap: entries in robots.txt?

    - by user306942
    I have been searching around using Google but I can't find an answer to this question. A robots.txt file can contain the following line: Sitemap: http://www.mysite.com/sitemapindex.xml but is it possible to specify MULTIPLE sitemap index files in the robots.txt and have the search engines recognize that and crawl ALL of the sitemaps referenced in each sitemap index file? For example, will this work: Sitemap: http://www.mysite.com/sitemapindex1.xml Sitemap: http://www.mysite.com/sitemapindex2.xml Sitemap: http://www.mysite.com/sitemapindex3.xml

    Read the article

  • mean of robots.txt at yahoo.com

    - by hussain
    i want to know the mean of yahoo robots.txt that website( http://www.yahoo.com/robots.txt ) have the following lines User-agent: * Disallow: /p/ Disallow: /r/ Disallow: /*? i dont know the mean of last line(Disallow: /*?) please let me know... thanks and advance

    Read the article

  • my robots.txt file in web application

    - by Zerotoinfinite
    Hi, I am using asp.net with C#. To increase the searchibility of my site in google, I have searched & found out that I can do it by using my robots.txt , but I really don't have any idea how to create it and where can I place my tag like 'asp.net, C#' in my txt file. Also, the necessary steps to to include it in my application. Please help. Thanks in advance

    Read the article

  • Search for duplicate prefix of file names

    - by Mia
    I have a folder contains files name as: xxx.get.txt and xxx.resp.txt xxx.get.txt, xxx.resp.txt yyy.get.txt, yyy.resp.txt zzz.get.txt, zzz.resp.txt, etc each prefix xxx should have two corresponding files, .get.txt and .resp.txt However, now I calculate the number of .get.txt and the number of .resp.txt files, they are not equal, there're ten more .get.txt. I want to find out, which .get.txt files do not have ´.resp.txt´ file Is it possible? Many thanks!

    Read the article

  • Cross-domain jQuery using YQL gives robots.txt error

    - by Jens Roland
    On the page http://qxlapps.dk/test.htm I am trying to perform an Ajax load from another domain, qxlapp.dk. I am using James Padolsey's xdomainajax.js plugin from: http://james.padolsey.com/javascript/cross-domain-requests-with-jquery/ When I open my test page, I get no output, but FireBug shows the JSON result, including the error message: "forbidden":"robots.txt for the domain disallows crawling for url: http://qxlapp.dk/projects/dagens_kup/show.php". The robots.txt on the qxlapp.dk domain contains the following: User-agent: Yahoo Pipes 2.0 Allow: / User-agent: * Allow: / So I don't see what the problem is? Shouldn't it pull the page just fine with those settings?

    Read the article

  • SEO chaos from changing robots.txt file in Wordpress site

    - by Seedorf
    Hi there, I recently edited the robots.txt file in my site using a wordpress plugin. However, since i did this, google seems to have removed my site from their search page. I'd appreciate if I could get an expert opinion on why this is so, and a possible solution. I'd initially done it to increase my search ranking by limiting the pages being accessed by google. This is my robots.txt file in wordpress: User-agent: * Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /trackback Disallow: /feed Disallow: /comments Disallow: /category/*/* Disallow: */trackback Disallow: */feed Disallow: */comments Disallow: /*?* Disallow: /*? Allow: /wp-content/uploads Sitemap: http://www.instant-wine-cellar.co.uk/wp-content/themes/Wineconcepts/Sitemap.xml

    Read the article

  • we are getting .txt file but not getting proper alignment

    - by pmms
    we are getting the following texfile_screenshot1.JPG when we are exporting data to .txt file we need output which is shown in texfile_screenshot2.JPG following is the code $myFile = "user_password.txt"; $fh = fopen($myFile, 'a') or die("can't open file"); $newline ="\r\n"; fwrite ($fh,$newline); $stringData1 = $_POST['uname1']." "." "." " ; fwrite($fh, $stringData1); $stringData1 =$_POST['password1']." "." "." "; fwrite($fh,$stringData1); $stringData1 = $_POST['email1']." "." "." "; fwrite($fh, $stringData1); fclose($fh);

    Read the article

  • disallow certain url in robots.txt

    - by chrism
    We implemented a rating system on a site a while back that involves a link to a script. However, with the vast majority of ratings on the site at 3/5 and the ratings very even across 1-5 we're beginning to suspect that search engine crawlers etc. are getting through. The urls used look like this: http://www.thesite.com/path/to/the/page/rate?uid=abcdefghijk&value=3 When we started we add the following to our robots.txt: User-agent: * Disallow: /rate Is this incorrect or are googlebot and others simply ignoring our robots.txt?

    Read the article

  • Conecting bash script with a txt file

    - by cathy
    I have a txt file with 100+ lines called page1.txt; odd lines are urls and even lines are url names. I have a bash file already created that checks urls for completion. Except right now, the process is really manual because I have to modify the bash every time I need to check a url. So I need to connect the bash to the txt using the variable url. $url should get all the odd lines from page1.txt and check if the link is complete or not. Also, how would I write a variable called name that derives from the url the 7 digits? bash file manually: url=http://www.-------------/-/8200233/1/ name=8200233 lynx -dump $url > $name.txt I would prefer if the bash file could add "Complete/In-Complete " at the beginning of every even line in the page1.txt file but a new text file could be also created to keep track of the Completes/In-completes.

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >