Search Results

Search found 4783 results on 192 pages for 'txt'.

Page 1/192 | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

how to get this working if exist tasklist=="notepad %DATE%.txt" [closed]

- by blade19899

@echo off title Log file creator if exist "%CD%\%DATE%.txt" (msg * "the log of today(%DATE%.txt) Has already been made" goto :notepad) else (goto :start) goto :eof :start echo %date%>>"%date%".txt echo %username% >>"%date%".txt echo.>>"%date%".txt echo.>>"%date%".txt echo ----------------------------------------Reports---------------------------------------- >>"%date%".txt echo.>>"%date%".txt echo.>>"%date%".txt echo.>>"%date%".txt echo.>>"%date%".txt echo.>>"%date%".txt echo.>>"%date%".txt echo.>>"%date%".txt echo.>>"%date%".txt echo ----------------------------------------TO-DO---------------------- >>"%date%".txt echo.>>"%date%".txt echo.>>"%date%".txt :notepad if exist tasklist=="notepad %DATE%.txt" (msg * "the log of today(%DATE%.txt) Has already been made, en is opened") else (goto :start) goto :eof :start start notepad %DATE%.txt This code right now pops up the log of today(%DATE%.txt) Has already been made and after clicking OK it doesn't do anything it should open the msg the log of today(%DATE%.txt) Has already been made, en is opened i have notepad opened. with process explorer it shows notepad and the date.txt my question is how to make it show the the log of today(%DATE%.txt) Has already been made, en is opened box... and perhaps bring notepad to the foreground? ps not sure if this question belongs here. Apologize if it doesn't!

Read the article
Robots.txt practices with .htaccess redirections (inherits)

- by Jayhal

I have a question regarding how to write robots.txt files for many domains and subdomains with redirects in place. We have a hosting account that enacts primary and add-on domains. All of our domains and subdomains, including the primary domain, is redirected via htaccess 301s to their own subdirectories in the primary domain's root directory. I'm confused about how I would write the robots.txt for certain directories. First, I wanted to confirm I am right in understanding that for domains and subdomains, crawlers will look to the directory that acts as that urls root directory for the crawling rules(robots.txt). Also, that a directory will not be affected by a robots.txt present in their parent directory if the directory has its own domain/subdomain, and that url is the one being accessed by crawlers. (Am pretty sure, but I wanted to confirm I didnt have a fundamentally flawed understanding of robots.txt) In the original root directory on the account(where the primary domain was directed before htaccess was put in place) what should the robots.txt contain? When crawlers look to crawl our primary domain, will they look to the original root directory for the robots.txt or will they reference the file contained in the new subdirectory where all the primary domain's site files are located? If so, what should the root's robot.txt include if anything at all. Would I be right to include a simple 'disallow: /' for all agents, and then include more specific robots.txt files in each subdirectory with more specific instructions. Would that affect the crawling of the directory where the primary domain is now redirected? Any help is greatly appreciated, Thanks!

Read the article
Redirect Google crawler to different robots.txt via .htaccess

- by user3474818

I have googled for the answer all day and still couldn't find an answer. I have a virtual subdomain www.static.example.com which is a mirror site of www.example.com. It means I have just one root folder for subdomain and domain aswell. I want to redirect crawlers to different robots.txt file - robots_static.txt when they see .static in url in which I will forbid indexing via /disallow command. I want to do this because I have duplicated content in Google search results. Subdomain is showing the exact same content as the main domain. Does anyone know how could I achieve that crawlers sees robots_static.txt instead of robots.txt? What I have managed to find so far is this: RewriteCond %{HTTP_HOST} ^www.static.*$ [NC] RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*robots\.txt.*\ HTTP/ [NC] RewriteRule ^robots\.txt /robots_static.txt [NC,L] but when I check in webmaster tools, it still sees robots.txt as my robots file instead of robots_static.txt, so it crawls and index everything twice. What did I do wrong? Thanks EDIT: This is my .htaccess file ## # @package Joomla # @copyright Copyright (C) 2005 - 2013 Open Source Matters. All rights reserved. # @license GNU General Public License version 2 or later; see LICENSE.txt ## ## # READ THIS COMPLETELY IF YOU CHOOSE TO USE THIS FILE! # # The line just below this section: 'Options +FollowSymLinks' may cause problems # with some server configurations. It is required for use of mod_rewrite, but may already # be set by your server administrator in a way that dissallows changing it in # your .htaccess file. If using it causes your server to error out, comment it out (add # to # beginning of line), reload your site in your browser and test your sef url's. If they work, # it has been set by your server administrator and you do not need it set here. ## ## Can be commented out if causes errors, see notes above. Options +FollowSymLinks ## Mod_rewrite in use. RewriteEngine On RewriteEngine On RewriteCond %{HTTP_HOST} !^www\. RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L] RewriteCond %{HTTP_HOST} ^www.static.*$ [NC] RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*robots\.txt.*\ HTTP/ [NC] RewriteRule ^robots\.txt /robots_static.txt [NC,L] ## Begin - Rewrite rules to block out some common exploits. # If you experience problems on your site block out the operations listed below # This attempts to block the most common type of exploit `attempts` to Joomla! # # Block out any script trying to base64_encode data within the URL. RewriteCond %{QUERY_STRING} base64_encode[^(]*\([^)]*\) [OR] # Block out any script that includes a <script> tag in URL. RewriteCond %{QUERY_STRING} (<|%3C)([^s]*s)+cript.*(>|%3E) [NC,OR] # Block out any script trying to set a PHP GLOBALS variable via URL. RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR] # Block out any script trying to modify a _REQUEST variable via URL. RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2}) # Return 403 Forbidden header and show the content of the root homepage RewriteRule .* index.php [F] # ## End - Rewrite rules to block out some common exploits. ## Begin - Custom redirects # # If you need to redirect some pages, or set a canonical non-www to # www redirect (or vice versa), place that code here. Ensure those # redirects use the correct RewriteRule syntax and the [R=301,L] flags. # ## End - Custom redirects ## # Uncomment following line if your webserver's URL # is not directly related to physical file paths. # Update Your Joomla! Directory (just / for root). ## # RewriteBase / RewriteCond %{THE_REQUEST} ^GET.*index\.php [NC] RewriteCond %{THE_REQUEST} !/system/.* RewriteRule (.*?)index\.php/*(.*) /$1$2 [R=301,L] RewriteCond %{THE_REQUEST} ^GET ## Begin - Joomla! core SEF Section. # RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}] # # If the requested path and file is not /index.php and the request # has not already been internally rewritten to the index.php script RewriteCond %{REQUEST_URI} !^/index\.php # and the request is for something within the component folder, # or for the site root, or for an extensionless URL, or the # requested URL ends with one of the listed extensions RewriteCond %{REQUEST_URI} /component/|(/[^.]*|\.(php|html?|feed|pdf|vcf|raw))$ [NC] # and the requested path and file doesn't directly match a physical file RewriteCond %{REQUEST_FILENAME} !-f # and the requested path and file doesn't directly match a physical folder RewriteCond %{REQUEST_FILENAME} !-d # internally rewrite the request to the index.php script RewriteRule .* index.php [L] # ## End - Joomla! core SEF Section. <FilesMatch "\.(ico|pdf|flv|jpg|ttf|jpg|jpeg|png|gif|js|css|swf)$"> Header set Expires "Wed, 15 Apr 2020 20:00:00 GMT" Header set Cache-Control "public" </FilesMatch> <ifModule mod_headers.c> Header set Connection keep-alive </ifModule> ########## Begin - Remove Etags # FileETag none # ########## End - Remove Etags

Read the article
Robots.txt and "Bad" Robots

- by Lynda

I understand robots.txt and its purpose. I have read some people saying that using a Robots.txt gives "bad" robots or robots who do not obey a robots.txt a way to access pages on your site that you do not want accessed. While I am not looking to get into a debate about that I do have a question: If I have a structure like this: /Folder/ /Sub-Folder 1/ /Sub-Folder 2/ (Note: There are no pages within /Folder/ only other folders.) If I Disallow: /Folder/ it will prevent "good" robots from accessing the directory and any contents within the sub-folders. While we know that bad robots will see the /Folder/ will they be able to see and acess the sub-folders and the pages within the subfolders if they are not listed in the robots.txt? (Note: I do not fully understand how robots good or bad crawl a site beyond using a robots.txt and links within the site.)

Read the article
How come the ls command prints in multiple columns on tty but only one column everywhere else?

- by David Lou

Even after using Unix-like OSes for a couple years, this behaviour still baffles me. When I use the ls command in a directory that has lots of files, the output is usually nicely formatted into multiple columns. Here's an example: $ ls a.txt C.txt f.txt H.txt k.txt M.txt p.txt R.txt u.txt W.txt z.txt A.txt d.txt F.txt i.txt K.txt n.txt P.txt s.txt U.txt x.txt Z.txt b.txt D.txt g.txt I.txt l.txt N.txt q.txt S.txt v.txt X.txt B.txt e.txt G.txt j.txt L.txt o.txt Q.txt t.txt V.txt y.txt c.txt E.txt h.txt J.txt m.txt O.txt r.txt T.txt w.txt Y.txt However, if I try to redirect the output to a file, or pipe it to another command, only a single column appears in the output. Using the same example directory as above, here's what I get when I pipe ls to wc: $ ls | wc 52 52 312 In other words, wc thinks there are 52 lines, even though the output to the terminal has only 5. I haven't observed this behaviour in any other command. Would you like to explain this to me?

Read the article
Need assistance making a batch file for renaming files in separate folders

- by Carnaxus

Ok, here's one for you. I'm trying to use a batch file to rename a bunch of files, but none of them are in the same folder as the batch file itself. The command prompt keeps telling me that the directory can't be found. I suppose I could just rename all the files in all the folders that match the filename, but I don't want to do that either; I only want to change certain ones. My batch file as it stands is: @echo off ren "engine/info.txt" "disabled.txt" ren "gravplating/info.txt" "disabled.txt" ren "HAWX content/info.txt" "disabled.txt" ren "laserz/info.txt" "disabled.txt" ren "NeuroNaval/info.txt" "disabled.txt" ren "NeuroPlanes/info.txt" "disabled.txt" ren "NeuroTanks/info.txt" "disabled.txt" ren "NeuroWeapons/info.txt" "disabled.txt" ren "WAC Base/info.txt" "disabled.txt" ren "WAC DamageSystem/info.txt" "disabled.txt" ren "WAC GravityController/info.txt" "disabled.txt" ren "WAC Helicopters/info.txt" "disabled.txt" ren "WAC Sweps/info.txt" "disabled.txt" ren "weapons/info.txt" "disabled.txt" ren "AFF_ships/info.txt" "disabled.txt" ren "AntiTakeRifle/info.txt" "disabled.txt" ren "Catmull-Rom Cameras/info.txt" "disabled.txt" ren "Displacer Cannon/info.txt" "disabled.txt" ren "Drumdevil's Trains/info.txt" "disabled.txt" ren "EVEOnline/info.txt" "disabled.txt" ren "gm_botmap_v3/info.txt" "disabled.txt" ren "gm_construct_flatgrass_v5-2/info.txt" "disabled.txt" ren "gm_mobenix_v3_final/info.txt" "disabled.txt" ren "gm_mobenix_v3_highquality_Water/info.txt" "disabled.txt" ren "gm_snabbansairfield_b1/info.txt" "disabled.txt" ren "gm_XhS_construct/info.txt" "disabled.txt" ren "linedraw/info.txt" "disabled.txt" ren "ModelManipulator/info.txt" "disabled.txt" ren "NeuroCars/info.txt" "disabled.txt" ren "Propeller Engine/info.txt" "disabled.txt" ren "VanDookie and Predaaator's pack/info.txt" "disabled.txt" ren "WAC ECM/info.txt" "disabled.txt" ren "WAC Extra Helicopters/info.txt" "disabled.txt" echo Done! pause

Read the article
Different robots.txt for two different domains point to same folder

- by Ali

Hi, I have the following two domains: domain.com test.domain.com Both point to same folder which is "public_html". What I want is a different robots.txt file for each domain. So when someone browse domain.com/robots.txt then a different file is shown. And when someone go to test.domain.com/robots.txt then a different file is shown. How can I do this using URL rewriting in .htacces? Thanks

Read the article
How to remove old robots.txt from google as old file block the whole site

- by KnowledgeSeeker

I have a website which still shows old robots.txt in the google webmaster tools. User-agent: * Disallow: / Which is blocking Googlebot. I have removed old file updated new robots.txt file with almost full access & uploaded it yesterday but it is still showing me the old version of robots.txt Latest updated copy contents are below User-agent: * Disallow: /flipbook/ Disallow: /SliderImage/ Disallow: /UserControls/ Disallow: /Scripts/ Disallow: /PDF/ Disallow: /dropdown/ I submitted request to remove this file using Google webmaster tools but my request was denied I would appreciate if someone can tell me how i can clear it from the google cache and make google read the latest version of robots.txt file.

Read the article
robots.txt not updated

- by Haridharan

I have updated some url's and files in robots.txt file to block url's and files from google search results but, still files displaying in search results. As per a suggestion from a site I tried to update the robots.txt by below steps. In Google Webmaster tools, Health - Fetch as Google - type the url and click the fetch button. but, still files displaying in search results. Note: in Google Webmaster tools, Health - Blocked URL's - robots.txt file - downloaded date looks two dates back.

Read the article
Recovering from an incorrectly deployed robots.txt?

- by Doug T.

We accidentally deployed a robots.txt from our development site that disallowed all crawling. This has caused traffic to dip dramatically, and google results to report: A description for this result is not available because of this site's robots.txt – learn more. We've since corrected the robots.txt about a 1.5 weeks ago, and you can see our robots.txt here. However, search results still report the same robots.txt message. The same appears to be true for Bing. We've taken the following action: Submitted site to be recrawled through google webmaster tools Submitted a site map to google (basically doing everything possible to say "Hey we're here! and we're crawlable!") Indeed a lot of crawl activity seems to be happening lately, but still no description is crawled. I noticed this question where the problem was specific to a 303 redirect back to a disallowed path. We are 301 redirecting to /blog, but crawling is allowed here. This redirect is due to a site redesign, wordpress paths for posts such as /2012/02/12/yadda yadda have been moved to /blog/2012/02/12. We 301 redirect to wordpress for /blog to keep our google juice. However, the sitemap we submitted might have /blog URLs. I'm not sure how much this matters. We clearly want to preserve google juice for URLs linked to us from before our redesign with the /2012/02/... URLs. So perhaps this has prevented some content from getting recrawled? How can we get all of our content, with links pointed to our site from pre-and-post redesign reporting descriptions? How can we resolve this problem and get our search traffic back to where it used to be?

Read the article
URL blocked in robots.txt but still showing up on Google search [closed]

- by Ahmad Alfy

Possible Duplicate: Why do Google search results include pages disallowed in robots.txt? In my robots.txt I am disallowing a lot of URLs. Google webmaster tools says there're +750 URL blocked. The problem is the URLs are still showing on Google search. For example I have the following rule: Disallow: /entity/child-health/ But when I search some-keyword + child health the following URL shows up : http://www.sitename.com/entity/child-health/ Am I doing anything wrong? Is is possible for a URL to be blocked using robots.txt and still show up on search results?

Read the article
Is my robots.txt working as it should?

- by TigerBlood

I want crawlers to have access to http://www.example.com but not http://www.example.com/ My robots.txt is as follows: User-agent: * Allow: /$ Disallow: / My site is in google search results, but I am not coming up in Bing, Yahoo, etc. I have had the same robots.txt since last year, and I initially requested inclusion ~1 year ago, having also resubmitted the URL to those latter search engines several times since as well. Is my robots.txt blocking those other crawlers? And if so, why not google as well? Thanks in advance!

Read the article
Restricting crawler activity to certain directories with robots.txt

- by neimad

I would like to use robots.txt to prevent indexing of some parts of my website. I want search engines to index only the / directory and not search inside my controllers. In my robots.txt, I have this: User-Agent: * Disallow: /compagnies/ Disallow: /floors/ Disallow: /spaces/ Disallow: /buildings/ Disallow: /users/ Disallow: / I put this file in /mysite/public. I tested the file with a robots.txt validator and got no errors. However, Google always returns the result of my site. For testing, I added Disallow: /, but again, Google indexed all pages. floors, spaces, buildings, etc. are not physical directories. Is this a bug? How can I work around it?

Read the article
Robots.txt Disallow command [on hold]

- by Saahil Sinha

How to disallow folders through Robots.txt, which are been crawled due to wrong url structure, which thus cause duplicate page error The URL been crawled as incorrectly by Google leading to duplicate page error: www.abc.com/forum/index.php?option=com_forum However, The actual correct pages however are: www.abc.com/index.php?option=com_forum Is this a correct way by excluding them through robots.txt: To exclude www.abc.com/forum/index.php?option=com_forum Below is command Disallow: /forum/ Will it not block in legitimate component folder 'Forum' of site?

Read the article
Google indexing page with parameters but page is Disallowed in robots.txt

- by Jakobud

I have the following in robots.txt: User-agent: * Disallow: /refer.php User-agent: NinjaBot Allow: / Sitemap: http://www.mysite.com/sitemap.xml The refer.php file does various things depending on what GET parameters are passed to it. When I do a Google search, I see tons of results for pages like this: http://www.mysite.com/refer.php?o=23945 http://www.mysite.com/refer.php?o=39858 http://www.mysite.com/refer.php?o=9683 http://www.mysite.com/refer.php?o=10569 http://www.mysite.com/refer.php?o=58304 http://www.mysite.com/refer.php?o=69604 Is the reason that Google is indexing these because I don't have an asterisk * after refer.php in the robots.txt ? Should changing it to Disallow: /refer.php* fix the problem?

Read the article
Robots.txt never downloaded but some blocked URLs in GWT

- by Zistoloen

There is something I don't understand in Google Webmaster Tools (GWT) for my Wordpress site. In menu "Blocked URLs", it mention that my robots.txt has never been downloaded but there are some blocked URLs. It's kind of weird and not logical. Am i missing something? User-agent : * Disallow: /*? Disallow: /wp-login.php Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content Allow: /wp-content/uploads Disallow: */trackback Disallow: /*/feed Disallow: /*/comments Disallow: /cgi-bin Disallow: /*.php$ Disallow: /*.inc$ Disallow: /*.gz$ Disallow: /*.cgi$ Disallow: /author/* I'm afraid my robots.txt doesn't block several URLs I want to block.

Read the article
HTTP 303 redirection and robots.txt

- by Ian Dickinson

On a site I'm working on, we're using the HTTP 303 redirect pattern (see this article for background) to distinguish between information and non-information resources. So: some URL's under /id get redirected to dynamically-created pages under /doc. These dynamic pages are built from a database, and contain links to other /doc/ resources, so in general we don't want them to be crawled. Our robots.txt contains: Disallow: /doc However, we do want the non-redirected pages under /id to get indexed by Google et al: Allow: /id So the question I have, which I can't find an answer to so far, is: if an allowed /id page 303-redirects to a /doc page, will it still be blocked by robots.txt? If yes, we're OK, but otherwise I'm going to disallow all /id resources in the robots file, as having the crawler hammer the db would be worse than losing search indexing for the /id pages.

Read the article
How to secure robots.txt file?

- by CompilingCyborg

I would like for User-agents to index my relative pages only without accessing any directory on my server. As initial thought, i had this version in mind: User-agent: * Disallow: */* Sitemap: http://www.mydomain.com/sitemap.xml My Questions: Is it correct to block all directories like that - Disallow: */*? Would still search engines be able to see and index my sitemap if i disallowed all directories? What are the best practices for securing the robots.txt file? For Reference: Here is a good tutorial for robots.txt #Add this if you want to stop Alexa from indexing your site. User-agent: ia_archiver Disallow: / #Add this to stop duggmirror User-agent: duggmirror Disallow: / #Add this to allow specific agents User-agent: Googlebot Disallow: #Add this to allow all agents while blocking specific directories User-agent: * Disallow: /cgi-bin/ Disallow: /*?*

Read the article
What is a good robots.txt for WP ?

- by Steven

What is the "best" setup for robots.txt? I'm using the following permalink structure in Wordpress: /%category%/%postname%/. My robots.txt currently looks like this (copied from somewhere a long time ago): User-agent: * Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: /trackback Disallow: /comments Disallow: /category/*/* Disallow: */trackback Disallow: */comments I want my comments to be indext. So I can remove this, right? Do I want to disallow indexing categories because of my permalinkstructure? An article can have several tags and be in multiple categories. This may cause duplicates in google. How should I work around this? Would you change anything else here?

Read the article
My First robots.txt

- by Whitechapel

I'm creating my first robots.txt and wanted to get a second opinion on it. Basically I have a FTP setup on my board for some special users to transfer files between each other and I do NOT want that included in the search by the bots. I also want to point to my sitemap which gets auto generated by a PHP page. So here is what I have, what else should I include, and if I need to fix anything with it? Also, it's linking to xmlsitemap.php because that generates the sitemap when called. My goal is to allow any search bot crawl the forums to grab meta data. User-agent: * Disallow: /admin/ Disallow: /ali/ Disallow: /benny/ Disallow: /cgi-bin/ Disallow: /ders/ Disallow: /empire/ Disallow: /komodo_117/ Disallow: /xanxan/ Disallow: /zeroordie/ Disallow: /tmp/ Sitemap: http://www.vivalanation.com/forums/xmlsitemap.php Edit, I'm not sure how to handle all the user's folders under /public_html/ since the robots.txt will be going in /public_html.

Read the article
Quick Question, robots.txt Disallow: /*/ does what exactly?

- by Exit

A SEO firm suggested changing the robots.txt to: User-agent: * Disallow: /*/ Allow: /ims/ I'm not sure what that would do, but my guess is that is would tell all robots to index nothing but the ims folder. I understand the wildcard, but I'm confused by the slashes and don't know how they would play out in conjunction with the wildcard. * Update * I didn't mention that there is a sitemap listed in the robots.txt file, but according to one tech blogger, he realized that sitemaps trump robots exclusions. So, even though this says in Google Webmaster Tools that everything with a trailing slash will not be indexed, the sitemap contains the important links. I did notice that the link count on Google went from 360 to 336, and the sitemap links under the URL scaled back to 3 from 6. I'm not sure the cause or what links were removed, though. Perhaps it cleaned out garbage. I'm still clueless why they would add in 'Allow: /ims/', that seems pointless. And a quick list of what would index according to the robots rules above (withouth the sitemap) using /*/: domain.com Indexed domain.com/page.html Indexed domain.com/folder/ Not Indexed domain.com/folder/page.html Not Indexed

Read the article
Valid robots.txt? [closed]

- by psot

I am waiting for Google to crawl my site and display the results in search. Is my robots.txt alright and will it let google, bing etc crawl my site? Thanks! User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp-content/ Disallow: /build/ Disallow: /css/ Disallow: /trackback/ Disallow: /comments Disallow: /assets/graphics/ Disallow: /assets/visual/ Disallow: /category/*/* Disallow: */trackback Disallow: */feed Disallow: */comments Disallow: /*?* Disallow: /*? User-agent: Slurp Disallow: / User-agent: Baiduspider Disallow: / User-agent: ia_archiver Disallow: / User-agent: duggmirror Disallow: / User-agent: Yandex Disallow: / Sitemap: http://example.com/sitemap.xml.gz

Read the article
Googlebot fetches my pages very frequent, rel-nofollow, meta-noindex or robots.txt-disallow

- by trante

Googlebot fetches pages in my site very frequently. And this slowens my website. I don't want Googlebot to crawl too frequent. I decreased crawl rate from Google webmaster tools. But I'm supposing to use these three tools: Adding rel="nofollow" to my inner pages. So Googlebot won't crawl and index them. Adding meta tag "noindex" so Google will remove this page from index and won't get it again. Adding Disallow: /mySomeFolder/ to robots.txt and Googlebot won't crawl that pages. I'm planning to use these methods for my 56.000 pages, except the most important 6-7 pages. Which method would you prefer and what would be disadvantages or advantages ? Or won't it change my website speed etc..

Read the article
Grapeshot crawler ignoring robots.txt

- by QF_Developer

Has anyone come across a crawler called Grapeshot? They are hammering the same page repeatedly on our website. I believe they are looking for ad related keywords, based on previous content ad campaigns. The odd thing is we never ran any such campaigns on the page they are so interested in. We do have only a few pages running AdSense, is this what has attracted Grapeshot? I've added the following declaration to my robots.txt, but they don't seem to be honouring it? User-agent: grapeshot Disallow: / Any ideas on how to block this nuisance crawler? I'm starting to think the best way is by setting up IP rules in IIS?

Read the article
robots.txt dissalow url containing string with a '/' at the end

- by thanili

i have a website with thousands of dynamic pages. I want to use the robots.txt file in order to dissalow certain url patterns corresponding to pages with duplicate content. For example i have a page for article itemA belonging to category catA/subcatA, with URL: /catA/subcatA/itemA this is the URL that i want to be indexed from google. this article is also visible via tagging in various other places in the web site. The URLs produced via tagging is like: /tagA1/itemA this URL i want NOT to be indexed from google. However i want to have indexed all tag listings: /tagA1 so how can i achieve this? dissalow URLs of including a specific string with a '/' at the end? /tagA1/ itemA - dissalow /tagA1 - allow

Read the article

1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >