Search Results

Search found 4984 results on 200 pages for 'robots txt'.

Page 7/200 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >

ADF - Now with Robots!

- by Duncan Mills

I mentioned this briefly in a tweet the other day, just before the full rush of OOW really kicked off, so I though it was worth re-visiting. Check out this video, and then read on: So why so interesting? Well - you probably guessed from the title, ADF is involved. Indeed this is as about as far from the traditional ADF data entry application as you can get. Instead of a database at the back-end there's basically a robot. That's right, this remarkable tape drive is controlled through an ADF using all your usual friends of ADF Faces, Controller and Binding (but no ADFBC for obvious reasons). ADF is used both on the touch screen you see on the front of the device in the video, and also for the remote management console which provides a visual representation of the slots and drives. The latter uses ADF's Active Data Framework to provide a real-time view of what's going on the rack. . What's even more interesting (for the techno-geeks) is the fact that all of this is running out of flash storage on a ridiculously small form factor with tiny processor - I probably shouldn't reveal the actual specs but take my word for it, don't complain about the capabilities of your laptop ever again! This is a project that I've been personally involved in and I'm pumped to see such a good result and, I have to say, those hardware guys are great to work with (and have way better toys on their desks than we do). More info in the SL150 (should you feel the urge to own one) is here.

Read the article
What's the Use of Robots Text File?

Fundamentally, we like the content of our websites to be indexed immediately so that traffic will be driven and search engine ranking will be improved. But in some situations, a file or online tool is used to hide the pages and personal files we have in our website.

Read the article
What's the Use of Robots Text File?

Fundamentally, we like the content of our websites to be indexed immediately so that traffic will be driven and search engine ranking will be improved. But in some situations, a file or online tool is used to hide the pages and personal files we have in our website.

Read the article
Optimizing Robots Text File

We can block spiders to crawl restricted parts of our website. Restricted parts of our website means those links of our website which we don't want to be indexed in search engines and getting some unwanted visitors. For example:

Read the article
In Windows, a batch file with a recursive for loop and a file name including blanks

- by uvts_cvs

Hello, I have a folder tree, like this (it's only an example, it will be deeper in my real case): C:\test | +---folder1 | foo bar.txt | foobar.txt | +---folder2 | foo bar.txt | foobar.txt | \---folder3 foo bar.txt foobar.txt My files have one or more spaces in the name and I need to perform a command on them, so I am interested in foo bar.txt but not in foobar.txt. I tried (inside a batch file): for /r test %%f in (foo bar.txt) do if exist %%f echo %%f where the command is the simple echo. It does not work because the space is skipped and I get no output. This works but it is not what I need: for /r test %%f in (foobar.txt) do if exist %%f echo %%f It prints: C:\test\folder1\foobar.txt C:\test\folder2\foobar.txt C:\test\folder3\foobar.txt I tried using the quotation mark (") but it does not work: for /r test %%f in ("foo bar.txt") do if exist %%f echo %%f It does not work because the quotation mark is still included in the output: C:\test\folder1\"foo bar.txt" C:\test\folder2\"foo bar.txt" C:\test\folder3\"foo bar.txt"

Read the article
juju spends bootstrap-timeout with a final message it cannot find /var/lib/juju/nonce.txt

- by user285199

I build two VMware's machines. First one with MAAS, second one with a fresh installation from MAAS. Region controller was installed with Ubuntu 12.04 distribution, and upgraded (. Node computing was installed from MAAS with Quantal 12.10. Juju was installed and upgraded to 1.18 (from ppa:juju/stable repository). MAAS was upgraded from cloud-archive:tools repository. In debug mode, I got how Juju connects to node. Then I run the same instruction: ssh -o "StrictHostKeyChecking no" -o "PasswordAuthentication no" -i /home/lliurex/.juju/ssh/juju_id_rsa -i /home/lliurex/.ssh/id_rsa [email protected] /bin/bash It worked (with and without /bin/bash). When Juju spends all bootstrap-timeout tells it has not found /var/lib/juju/nonce.txt file. It's true, it doesn't exist. It doesn't mind if you put a timeout of 1800, 3600 or 72000, it always finishes the same.

Read the article
what is the file: C:\nppdf32Log\debuglog.txt

- by Jesse

Hello Everyone After I updated to 12.04, a file named " C:\nppdf32Log\debuglog.txt" occurs in my home directory, the content of the file is as the follow: NPP_Initialize : called NPP_GetValue is called NPP_SetWindow : called for instance 920c0e28 Window from browser - 77594625 NPP_SetWindow : called for instance 920c0e28 Window from browser - 77594625 NPP_SetWindow : called for instance 920c0e28 Window from browser - 77594625 NPP_NewStream : called for instance 920c0e28, stream 913403b0, URL http://www.xxxxxx.com/attachments/soft/CDGM%20Optical%20Glass%20Catalog.pdf, stream size 36177984, seekable 1 NPP_Write : called for instance 920c0e28, stream 913403b0, offset = 0, length = 16384, streamlength = 36177984 Trying for window attributes Trying for query tree NPP_Write : called for instance 920c0e28, stream 913403b0, offset = 16384, length = 16384, streamlength = 36177984 Trying for window attributes Trying for query tree ...... It seems this file is related to FireFox,what's exactly the problem? many thanks for your help!

Read the article
How can I block abusive bots from accessing my Heroku app?

- by aem

My Heroku (Bamboo) app has been getting a bunch of hits from a scraper identifying itself as GSLFBot. Googling for that name produces various results of people who've concluded that it doesn't respect robots.txt (eg, http://www.0sw.com/archives/96). I'm considering updating my app to have a list of banned user-agents, and serving all requests from those user-agents a 400 or similar and adding GSLFBot to that list. Is that an effective technique, and if not what should I do instead? (As a side note, it seems weird to have an abusive scraper with a distinctive user-agent.)

Read the article
600 visitors per day, 20 backlinks but still not referenced by Google

- by Tristan

Hello, i've launch a website on wednesday (9 August) There's already in 4 days 12,000 viewed pages 1,400 visits 20 to 25 backlincks a sitemap.xml (130 URL) english language / french language - url like that "/en/" "/fr/" BUT, i'm still not referrenced by google In the google webmaster i have 0 backlinks 130 in sitemap, 0 URL referrenced on google For a smaller website, i remember that i took me less to appear on google with less visits. My url is: http://www.seek-team.com/en/ for english and juste replace /en/ by /fr/ to access it in french. What's causing this ? Is there an explanation ? Thanks for your help (ps, i've already checked robots.txt)

Read the article
Hiding php includes from search spiders?

- by 21stcn

Quick and simple question. I have 80+ html files which I want to be crawled. They are individual product pages. Each of these pages calls its content using php includes. These php include files are in a separate folder on the server and contain the core content for the individual product pages. I just wanted to ask, if I use robots.txt or .htaccess to prevent crawling of the directory that holds the php content files, will there be no issue crawling the html pages which include these files? What I want to achieve is have the html files indexed with the php content included in them, but I don't want visitors landing on the php content pages, nor have these php files indexed as duplicate content. Just clarification needed as to whether it is safe to block spiders from accessing the php folder, without this affecting the html files being indexed with the included content. Is this the best way to do things? Or should I just leave the content php files to be crawled?

Read the article
How to block FeedReader from fetching my content to their site?

- by Wei Kai

As you all can see from the picture below, my site's content is duplicated by FeedReader and indexed at Google. When I clicked at the FeedReader link, it uses some sort of iFrame to draw content from my site live. This forms some sort of content duplication to me, and I believe it does harm to my site. (Stackoverflow doesn't allow me to post image due to new account, pls click at the link above to see the picture, million thanks to you.) What can I do to prevent Feedreader to fetch my content to their site? I know robots.txt can perform such function, but I don't know how to do it. Any help would be much appreciated. I have also highlighted this issue to FeedReader 2 days ago, but yet to get any reply from them.

Read the article
Bad Bot blocking Revisited

- by Tom

I've read a lot about bad bot blocking, php scripts, .htaccess techniques, etc... Is this a valid method? Since .htacces can rewrite and send a bad bot a 403 deny or forward to something like spam poison, is it possible to Disallow a folder, then through .htaccess in that specific folder redirect to spampoison? Since Apache reads each .htaccess independently and follows specific instructions, then a bad bot not following robots.txt would just be redirected. Or anyone trying to access, /badbot/ or whatever I choose to call my trap folder. Thanks Tom

Read the article
Clicks counting and crawler bots

- by Dennis

I am currently running a small affiliate-program for Facebook users. We use an auto-poster to publish links to fan pages. Every hit is stored in our database and we have included a 24 hour reload block for the IP-addresses. My problem right now is that the PHP script also stores every hit from all the bots that crawls my website. Now I was thinking to block those bots with the robots.txt of my website but I am afraid that this will have a negative effect on my AdSense ads. Does anybody have an idea for me how to work this out?

Read the article
How to disallow indexing but allow crawling?

- by John Doe

In the front page of my website, I have some previews to articles (with a small introduction to them) that link to the full articles. I want to disallow the front page to prevent duplicate content. But if I do this (in robots.txt), would it still be crawled? I mean, the full articles would be still reached by the crawler even though I disallowed the only page that links to them? I don't want the webcrawler not to access the page and enter the links in them, but I just don't want it to save the information (that will be repeated in the full articles).

Read the article
How do I deal with content scrapers? [closed]

- by aem

Possible Duplicate: How to protect SHTML pages from crawlers/spiders/scrapers? My Heroku (Bamboo) app has been getting a bunch of hits from a scraper identifying itself as GSLFBot. Googling for that name produces various results of people who've concluded that it doesn't respect robots.txt (eg, http://www.0sw.com/archives/96). I'm considering updating my app to have a list of banned user-agents, and serving all requests from those user-agents a 400 or similar and adding GSLFBot to that list. Is that an effective technique, and if not what should I do instead? (As a side note, it seems weird to have an abusive scraper with a distinctive user-agent.)

Read the article
SEO: disallowing Google from indexing forms in iframes or not?

- by Marco Demaio

I usually place forms in iframes (i.e. order form, request assistance form, contact forms, ect.). Just the forms, I never place other contents or pages in iframes. From a SEO point of view, would you exclude forms from being indexed/crawled by Google or not? I mean my forms hardly ever contains keyword/keyphrases, moreover I obviously place empty title/meta description tags in pages shown in iframe to display forms, cause those titles are never displaied in browser title bar. So I'm wondering what's the point of letting Google index them? Moreover I think these form pages might suck out PR from all other pages that are more valuable for SEO. If your answer is "yes I would exclude them form indexing" would you simply use robots.txt to exclude them? Thanks!

Read the article
CDN virtual subdomain causes duplicated content

- by user3474818

I have created a subdomain and a CNAME record which points to the domain root. The subdomain www.static.example.com is actually a copy of the entire website www.example.com and it is supposed to act as an CDN and serve static content in order to improve speed. However, all of my content can be accessed via subdomain aswell, so Google has indexed it all and now I am dealing with duplicated content. How could I deny access to crawlers for the subdomain baring in mind that I do not have different subfolder for subdomain, so I can't create a separate robots.txt file?

Read the article
Googlebot can't access my site when crawling from rootdomain

- by PéCé

I can't explain why I get this message for my rootdomain result in Google : trocmalin.com/ A description for this result is not available because of this site's robots.txt – learn more. Here is my site specifics : vide-greniers.trocmalin.com is the site address www.trocmalin.com redirects (301) to vide-greniers.trocmalin.com trocmalin.com redirects (301) to vide-greniers.trocmalin.com too... User-agent: * Disallow: /orga/ User-agent: * Disallow: /sitemap-update Google results for vide-greniers.trocmalin.com are well rendered, as well as sub pages allowed for bots. But the result for my rootdomain (trocmalin.com) gives this message... Can you help me ?

Read the article
Will duplicate international (i18n) content hinder SEO rankings?

- by Rhys

Google clearly states that duplicate content within a single, or multiple, domains is not advised. This is understood, but I am not sure of any exceptions for sites with region-specific content that is often replicated across locales. For example, a site's /en-us/about page could be identical to /en-uk/about, whereas most likely /en-ja/about is unique. Are GYM smart enough to understand that the initial URL depth is a locale specifier? Is there any robots.txt or header, etc, trickery that I should include to outline the site's international structure?

Read the article
Search engine bots accessing strange URLs

- by casasoft

We have ELMAH enabled on our site and get errors whenever a Page Not Found error is triggered on the website. We have recently redesigned a new website and so we understand that search engine robots might have previously indexed pages which they try to access and result in a Page Not Found errors. For this reason, we have set up permanent redirects for such previously indexed pages to the respective new pages. The website in mention is www.chambercollege.com and for example, a previously indexed URL was www.chambercollege.com/special-offers.aspx. This page is no longer accessible so we have created the necessary permanent redirect to redirect to the respective page on www.chambercollege.com/en/content/special-offers-161/. Now we are starting to receive Page Not Found errors of search engine bots (e.g. MSN bot) trying to access the URL www.chambercollege.com/special-offers.aspx/images/shadow_right.jpg/. Any idea how could a search engine make up that strange URL and whether you have any suggestions of what to do best?

Read the article
how to restrict access to all .txt file in apache except robots.txt?

- by user3162764

I am configuring apache2 on debian and would like to allow only robots.txt to be accessed for searching engines, while other .txt files are restricted, I tried to add the followings to .htaccess but no luck: <Files robots.txt> Order Allow,Deny Allow from All </Files> <Files *.txt> Order Deny,Allow Deny from All </Files> Can anyone help or give me some hints? I am new comer to apache, thanks a lot.

Read the article
Htaccess/robots.txt to allow search bots to explore main domain but not directory on other domain

- by gX

Ok, I understand the Title didn't make any sense so here I've tried to explain it in detail. I'm using a hosting that gives me space for my domain and lets me "add on" other domains on it. So lets say I have a domain A, and I add on a domain B. Basically my hosting gives me a public_html where I can put stuff that shows when someone visits website A. But, when I add the domain B, it lets me put the content of B, INSIDE of that public_html so that website B.com can also be visited by going to A.com/siteB... Thats all good, except that Google has started indexing B.com as well as A.com/siteB, I'm ok with it indexing B.com, but I somehow want to prevent it from indexing A.com/siteB so that when people search for B, it doesn't end up showing A.com/siteB. Any ideas? Let me know if the question is still unclear.

Read the article
Handling SEO for Infinite pages that cause external slow API calls

- by Noam

I have an 'infinite' amount of pages in my site which rely on an external API. Generating each page takes time (1 minute). Links in the site point to such pages, and when a users clicks them they are generated and he waits. Considering I cannot pre-create them all, I am trying to figure out the best SEO approach to handle these pages. Options: Create really simple pages for the web spiders and only real users will fetch the data and generate the page. A little bit 'afraid' google will see this as low quality content, which might also feel duplicated. Put them under a directory in my site (e.g. /non-generated/) and put a disallow in robots.txt. Problem here is I don't want users to have to deal with a different URL when wanting to share this page or make sense of it. Thought about maybe redirecting real users from this URL back to the regular hierarchy and that way 'fooling' google not to get to them. Again not sure he will like me for that. Letting him crawl these pages. Main problem is I can't control to rate of the API calls and also my site seems slower than it should from a spider's perspective (if he only crawled the generated pages, he'd think it's much faster). Which approach would you suggest?

Read the article
mod_rewrite and SEO friendliness

- by John Doe

My website has an atypical structure and I'm not sure if this could create problems in the long run, specially for SEO positioning purposes. I have a unique, large PHP script, and I use the Apache module mod_rewrite in the .htaccess file to create friendly URLs, for example: RewriteRule ^$ /index.php?section=Main RewriteRule ^createArticle$ /index.php?section=Main&view=CreateArticle RewriteRule ^configuration$ /index.php?section=Configuration RewriteRule ^article/([0-9]{1,10})$ /index.php?section=Article&view=Default&id=$1 RewriteRule ^deleteArticle/([0-9]{1,10})$ /index.php?section=Article&view=Delete&id=$1 RewriteRule ^reportArticle/([0-9]{1,10})$ /index.php?section=Article&view=Report&id=$1 RewriteRule ^logIn$ /index.php?section=Authentication ... So, www.example.com/index.php?section=Article&view=Default&id=105 would become www.example.com/article/105. The only real physical file is index.php, in which the parameters of the URL queried is processed and the corresponding result is outputted. My question is, do the crawling robots (e.g. Googlebot) recognize these links? Do they index the resulting HTML outputted by index.php with the specified parameters as if it was a actual HTML file? Also, would this become a problem when creating a Sitemap?

Read the article
What kind of redirect (301 or 302) for an email links tracker?

- by MaxiWheat

We are developing an email sending application ("à la" Mailchimp). Hyperlinks inserted by our users, in the emails they want to send, are replaced by a tracking URL on our application (https://ourdomain.com/trackingurl?blablabla) which then redirects the email reader to the original URL our users included in their emails. This allows us to record statistics about link clicks. Until now, we used 301 for those redirections, but we noticed that Google began indexing pages on our application which are in fact redirects to other domains. (The title and snippet in Google results are from the other domain, but the link in green is from our application). We took action by adding those urls to our robots.txt, but Google seems to take forever (months!) before removing them for its index and removing them by hand in Webmaster Tools would take a lot of time since there are lot. I would like to know which kind of HTTP redirect (301 or 302) is best suited for this kind of opreation ? Do you think switching to 302 redirects could improve this situation since we don't really want Google to index redirected links from our clients emails ?

Read the article

< Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >