Search Results

Search found 4783 results on 192 pages for 'a txt'.

Page 5/192 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Sony Ericsson txt: workhorse feature phone with Java ME tech

- by hinkmond

Just like your basic Quarter Horse, the new Sony Ericsson txt feature phone might not be as fancy as a "thoroughbred" smartphone, but it can sure get the job done with Java ME technology. See: Sony Ericsson txt w/Java ME Here's a quote: ...comes with the usual features such as a web browser, email client and music player and FM radio, plus support for social networking applications and a YouTube client. You can download and install additional Java applications... Sometimes the simple workhorse feature phone (with Java ME) is much better to go with than the idiosyncratic thoroughbred smartphone. Hinkmond

Read the article
How to test robots.txt in googlebot to find out what is being indexed

- by Amar Jarubula

This question is a continuation for this answer How to check if googlebot will index a given url? As was told I did go to the Webmaster Tools and tested contents of my robots.txt file. However this is just giving me the info if that content is good enough or not. However for my scenario I need to test whether disallowing some patterns is being indexed or not. For example I have something like this below in my robots.txt disallow:/pattern* My understanding is the URLs with word pattern should not crawled, but how do I test this pattern is enforced while indexing the website?

Read the article
TCL How to read , extract and count the occurent in .txt file (Current Directory)

- by Passion

Hi Folks, I am beginner to scripting and vigorously learning TCL for the development of embedded system. I have to Search for the files with only .txt format in the current directory, count the number of cases of each different "Interface # nnnn" string in .txt file, where nnnn is a four or 32 digits max hexadecimal no and o/p of a table of Interface number against occurrence. I am facing implementation issues while writing a script i.e, Unable to implement the data structure like Linked List, Two Dimensional array. I am rewriting a script using multi dimension array (Pass values into the arrays in and out of procedure) in TCL to scan through the every .txt file and search for the the string/regular expression ‘Interface # ’ to count and display the number of occurrences. If someone could help me to complete this part will be much appreciated. Search for only .txt extension files and obtain the size of the file Here is my piece of code for searching a .txt file in present directory set files [glob *.txt] if { [llength $files] > 0 } { puts "Files:" foreach f [lsort $files] { puts " [file size $f] - $f" } } else { puts "(no files)" } I reckon these are all the possible logical steps behind to complete it i) Once searched and find the .txt file then open all .txt files in read only mode ii) Create a array or list using the procedure (proc) Interface number to NULL and Interface count to zero 0 iii) Scan thro the .txt file and search for the string or regular expression "interface # iv) When a match found in .txt file, check the Interface Number and increment the count for the corresponding entry. Else add new element to the Interface Number list v) If there are no files return to the first directory My o/p is like follows Interface Frequency 123f 3 1232 4

Read the article
How do I remove a LOT of indexed pages from Google?

- by Thierry

A few weeks ago we have figured out that Google has indexed some information we would rather keep in some confidentiality, in the format of individual PDF files. Our assumption was that this was a problem with our robots.txt we had overlooked. Even though we are not sure whether or not this is the case, we are certain that the robots.txt file is in a valid format and is, according to Google's webmaster tools, blocking the files. However, even after this adjustment that has been made weeks ago, Google still has the PDF files indexed, but does tell us further information cannot be provided due to the robots.txt file being present. As you can hopefully understand, this is unwanted behaviour due to the nature of the documents. I am aware that there is a request page being provided by Google for this purpose, but there are a lot of files. Is there an easier way to get Google to remove all of the files from its search engine? If not, is there anything else you could advise us to do besides manually requesting Google to remove every single page? Thanks in advance.

Read the article
What does robots.txt file do in PHP project?

- by OM The Eternity

What does robots.txt file do in PHP project?

Read the article
Weird entry for robots.txt on a Naked Domain in Google Webmaster Tools

- by Metalshark

We own a .co.uk address and use an Internet hosting company that has made mistakes around DNS in the past. Our main site is hosted on www. and their reluctance to allow editing of AAAA records on-line means our naked domain does not resolve. Currently when we attempt to reach the naked version there is no entry for the browser to go to and it displays an unreachable page (nslookup just says Name: name of domain with no further entries such as an IP or Canonical Name). We recently added the relevant TXT records to verify us to view both the www. version and the naked version of the domain in Google Webmaster Tools (in anticipation of the requests to our Internet host coming to fruition). Imagine our shock when double checking the Site configuration Crawler access and finding a (admittedly failing) robots.txt with a dynamically generated HTML page (full of crude pop-up JavaScript) with references to 3 of our most prominent competitors. What could cause this to happen? As we are in the UK I am assuming some DNS server is serving Google bad information. We are going to contact the Internet hosting company to fix our A and AAAA records once and for all, then check that they work in the US (using something like OpenDNS). Should we be doing more though, for instance informing Google (through Webmaster Tools) that we are now aware there is something currently wrong with our naked domain? UPDATE: We have fixed our A records (not AAAA) and that has resolved the issue. But if there are further actions we should take for effectively having a parking page hosted on our active visitor-heavy, SEO-rich domain that advertised our competitors to US visitors, what would they be?

Read the article
Ranking drop after using reverse proxy for blog subdirectory and robots.txt for old blog subdomain

- by user40387

We have a 3Dcart store and a WordPress blog hosted on a separate server. Originally, we had a CNAME set up to point the blog to http://blog.example.com/. However, in our attempt to boost link-based and traffic-based authority on the main site, we've opted to do a reverse proxy to http://www.example.com/blog/. It’s been about two months since we finished the reverse proxy migration. It appears that everything is technically working as intended, including some robots and sitemap changes; the new URLs are even generating some traffic, as indicated on Google Analytics. While Google has been indexing the new URL locations, they’re ranking very poorly, even for non-competitive, long-tail keywords. Meanwhile, the old subdomain URLs are still ranking mostly as well as they used to (even though they aren’t showing meta titles and descriptions due to being blocked by robots.txt). Our working theory is that Google has an old index of the subdomain URLs, and is considering the new URLs to be duplicate content, since it’s being told not to crawl the subdomain and therefore can’t see the rel canonicals we have in place. To resolve this, we’ve updated the subdomain’s robot.txt to no longer block crawling and indexing. Theoretically, seeing the canonical tag on the subdomain pages will resolve any perceived duplicate content issues. In the meantime, we were wondering if anyone would have any other ideas. We are very concerned that we’ll be losing valuable traffic, as we’re entering our on season at the moment.

Read the article
Rewrite for robots.txt and favicon.ico [closed]

- by BHare

I have setup some rules in which subdomains (my users) will default to where I have located the robots.txt, favicon.ico, and crossdomain.xml therefore if a user creates a site say testing.mywebsite.com and they don't make their own favicon.ico at testing.mywebsite.com/favicon.ico, then it will use the favicon.ico I have in /misc/favicon.ico This works perfect, but it doesn't work for the main website. If you attempt to go to mywebsite.com/favicon.ico it will check if "/" exists, in which it does. And then never redirects to /misc/favicon.ico How can I get it so both instances redirect to /misc/favicon.ico ? # Set all crossdomain (openpalace file) favorite icons and robots.txt doesnt exist on their # side, then redirect to site's just to have something to go on. RewriteCond %{REQUEST_URI} crossdomain.xml$ RewriteCond ^(.+)crossdomain.xml !-f RewriteRule ^(.*)$ /misc/crossdomain.xml [L] RewriteCond %{REQUEST_URI} favicon.ico$ RewriteCond ^(.+)favicon.ico !-f RewriteRule ^(.*)$ /misc/favicon.ico [L] RewriteCond %{REQUEST_URI} robots.txt$ RewriteCond ^(.+)robots.txt !-f RewriteRule ^(.*)$ /misc/robots.txt [L]

Read the article
Should I prevent search engines indexing tag/category pages?

- by Macha

On my site, I currently have no special rules for search engines. It is a blog, statically generated using a Python program. When I search for some of my articles on Google, there is usually a tag or category page included in the results. Sometimes it even ranks ahead of the article itself. Obviously, as these links aren't always going to have the article on them, this aren't the results I want people to click on. So, I'm thinking of setting noindex on these pages. Is there any possible downside to doing so? Is this possible to do via robots.txt, or do I have to add it to all the relevant templates? All I can find for robots.txt are ways to stop the search engine crawling those pages, which isn't what I want - while I don't want them indexed, it's still the only surefire way to find all my blog posts.

Read the article
Why google isn't updating my site title in search results? [closed]

- by SharkTheDark

Possible Duplicate: Google doesn't seem to update the description or title of my homepage I had my domain for few days before I uploaded site to it, and it had one title, and then when I uploaded content it should get new title, but with my misunderstanding of WordPress it had blocked robots.txt and keyword with no-index and no-follow. But I removed that like 7 days ago, and I see in reports that Google bot is crawling over my site, but my site title isn't updating, it still has old domain title when site wasn't there... My robots.txt has now: User-agent: * Allow: / I have clear title tag on every page. How long does it take to update? Do I need to check something else?

Read the article
After Caldera.com's Robots.txt is Removed, Some Evidence Surfaces

<b>Groklaw: </b>"Now that SCO has sold off the caldera.com domain name, their previous robots.txt file no longer blocks access to the legacy Caldera web pages on Internet Archive. And what has popped up?"

Read the article
[PHP/PDF] Is it possible to read a pdf file as a txt?

- by Kel

I need to find a certain key in a pdf file. As far as I know the only way to do that is to interpret a pdf as txt file. I want to do this in PHP without installing a addon/framework/etc. Thanks

Read the article
In Windows, a batch file with a recursive for loop and a file name including blanks

- by uvts_cvs

Hello, I have a folder tree, like this (it's only an example, it will be deeper in my real case): C:\test | +---folder1 | foo bar.txt | foobar.txt | +---folder2 | foo bar.txt | foobar.txt | \---folder3 foo bar.txt foobar.txt My files have one or more spaces in the name and I need to perform a command on them, so I am interested in foo bar.txt but not in foobar.txt. I tried (inside a batch file): for /r test %%f in (foo bar.txt) do if exist %%f echo %%f where the command is the simple echo. It does not work because the space is skipped and I get no output. This works but it is not what I need: for /r test %%f in (foobar.txt) do if exist %%f echo %%f It prints: C:\test\folder1\foobar.txt C:\test\folder2\foobar.txt C:\test\folder3\foobar.txt I tried using the quotation mark (") but it does not work: for /r test %%f in ("foo bar.txt") do if exist %%f echo %%f It does not work because the quotation mark is still included in the output: C:\test\folder1\"foo bar.txt" C:\test\folder2\"foo bar.txt" C:\test\folder3\"foo bar.txt"

Read the article
juju spends bootstrap-timeout with a final message it cannot find /var/lib/juju/nonce.txt

- by user285199

I build two VMware's machines. First one with MAAS, second one with a fresh installation from MAAS. Region controller was installed with Ubuntu 12.04 distribution, and upgraded (. Node computing was installed from MAAS with Quantal 12.10. Juju was installed and upgraded to 1.18 (from ppa:juju/stable repository). MAAS was upgraded from cloud-archive:tools repository. In debug mode, I got how Juju connects to node. Then I run the same instruction: ssh -o "StrictHostKeyChecking no" -o "PasswordAuthentication no" -i /home/lliurex/.juju/ssh/juju_id_rsa -i /home/lliurex/.ssh/id_rsa [email protected] /bin/bash It worked (with and without /bin/bash). When Juju spends all bootstrap-timeout tells it has not found /var/lib/juju/nonce.txt file. It's true, it doesn't exist. It doesn't mind if you put a timeout of 1800, 3600 or 72000, it always finishes the same.

Read the article
Robot.txt can get all soft404s fixed?

- by olo

I got many soft404 in Google webmaster Tools, and those webpages aren't existing any more. thus I am unable to insert <meta name="robots" content="noindex, nofollow"> into my pages, and I've been searching a while but didn't get some valuable clues. There are about 100 URLs are soft 404, to redirect them all one by one is a bit silly as it would cost too much time for me. If i just add those links into robot.txt like below User-agent: * Disallow: /mysite.asp Disallow: /mysite-more.html if this way will fix all soft404s solidly? or if there is a way to change all soft404 to hard404? Please give me some suggestions. Many thanks

Read the article
Do or can robots cause considerable performance issues?

- by Anicho

So the question in the title is exactly what I am trying to find out. My case is: At work we are in a discussion with team members who seem to think bots will cause us problems relating to performance when running on our services website. Out setup: Lets say I have site www.mysite.co.uk this is a shop window to our online services which sit on www.mysiteonline.co.uk. When people search in google for mysite they see mysiteonline.co.uk as well as mysite.co.uk. Cases against stopping bots crawling: We don't store gb's of data publicly available on the web Most friendly bots, if they were to cause issues would have done so already In our instance the bots can't crawl the site because it requires username & password Stopping bots with robot .txt causes an issue with seo (ref.1) If it was a malicious bot, it would ignore robot.txt or meta tags anyway Ref 1. If we were to block mysiteonline.co.uk from having robots crawl this will affect seo rankings and make it inconvenient for users who actively search for mysite to find mysiteonline. Which we can prove is the case for a good portion of our users.

Read the article
How to recover a website's lost robot.txt?

- by Jessica

I found my website in the Wayback Machine a few months ago, but today I've tried again and now it tells me it can't find robots.txt. My old webhost stopped paying for their servers back in August without any notice. I was going to do a backup the day it happened. Is there a way just to find the text? I have the old IP, images, but nothing else. None of the big search engines have caches anymore, and I already looked in the cache of three of my Macs with nothing to be found.

Read the article
what is the file: C:\nppdf32Log\debuglog.txt

- by Jesse

Hello Everyone After I updated to 12.04, a file named " C:\nppdf32Log\debuglog.txt" occurs in my home directory, the content of the file is as the follow: NPP_Initialize : called NPP_GetValue is called NPP_SetWindow : called for instance 920c0e28 Window from browser - 77594625 NPP_SetWindow : called for instance 920c0e28 Window from browser - 77594625 NPP_SetWindow : called for instance 920c0e28 Window from browser - 77594625 NPP_NewStream : called for instance 920c0e28, stream 913403b0, URL http://www.xxxxxx.com/attachments/soft/CDGM%20Optical%20Glass%20Catalog.pdf, stream size 36177984, seekable 1 NPP_Write : called for instance 920c0e28, stream 913403b0, offset = 0, length = 16384, streamlength = 36177984 Trying for window attributes Trying for query tree NPP_Write : called for instance 920c0e28, stream 913403b0, offset = 16384, length = 16384, streamlength = 36177984 Trying for window attributes Trying for query tree ...... It seems this file is related to FireFox,what's exactly the problem? many thanks for your help!

Read the article
Google-Bot fell in love with my 404-page

- by 32bitfloat

Every day my access-log looks kind of this: 66.249.78.140 - - [21/Oct/2013:14:37:00 +0200] "GET /robots.txt HTTP/1.1" 200 112 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.78.140 - - [21/Oct/2013:14:37:01 +0200] "GET /robots.txt HTTP/1.1" 200 112 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.78.140 - - [21/Oct/2013:14:37:01 +0200] "GET /vuqffxiyupdh.html HTTP/1.1" 404 1189 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" or this 66.249.78.140 - - [20/Oct/2013:09:25:29 +0200] "GET /robots.txt HTTP/1.1" 200 112 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.75.62 - - [20/Oct/2013:09:25:30 +0200] "GET /robots.txt HTTP/1.1" 200 112 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.78.140 - - [20/Oct/2013:09:25:30 +0200] "GET /zjtrtxnsh.html HTTP/1.1" 404 1186 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" The bot calls the robots.txt twice and after that tries to access a file (zjtrtxnsh.html, vuqffxiyupdh.html, ...) which cannot exist and must return a 404 error. The same procedure every day, just the unexisting html-filename changes. The content of my robots.txt: User-agent: * Disallow: /backend Sitemap: http://mysitesname.de/sitemap.xml The sitemap.xml is readable and valid, so there seems to be no reason why the bot should want to force a 404-error. How should I interpret this behaviour? Does it point to a mistake I've done or should I ignore it?

Read the article
To disallow indexing the category and tag listings in a blog

- by Mert Nuhoglu

Mark Wilson says that category and tag listings in a blog should be disallowed in order to prevent duplicate content. I understand this. However, I want to put internal links on keywords in the blog posts to the tag and category pages in order for the readers to find more relevant content. I wonder whether putting those internal links to the category/tag pages which are disallowed in robots.txt is counted as useful from the perspective of SEO internal linking?

Read the article
Hiding a particulat page from search engines not to index

- by user702325

I have a page which i don't want search engines to index or crawl. I am not sure hat should i put in my robots.txt file to tell search engines not to crawl/index that page. The page it itself is getting generated dynamically and do not have a predefined template for it all i know about its URL which is pre-defined and will remain unchanged. I have this page say at www.mysite.com/my-nonindexable-page/ Please suggest what i should do to achieve this.I am using WordPress for my website

Read the article
Rendering plain text through PHP

- by JP19

Hi, For some reason, I want to serve my robots.txt via a PHP script. I have setup apache so that the robots.txt file request (infact all file requests) come to a single PHP script. The code I am using to render robots.txt is: echo "User-agent: wget\n"; echo "Disallow: /\n"; However, it is not processing the newlines. How to server robots.txt correctly, so search engines (or any client) see it properly? Do I have to send some special headers for txt files? EDIT: Now I have the following code: header("Content-Type: text/plain"); echo "User-agent: wget\n"; echo "Disallow: /\n"; which still does not display newlines (see http://sarcastic-quotes.com/robots.txt ). EDIT 2: Some people mentioned its just fine and not displayed in browser. Was just curious how does this one display correctly: http://en.wikipedia.org/robots.txt thanks JP

Read the article
how to restrict access to all .txt file in apache except robots.txt?

- by user3162764

I am configuring apache2 on debian and would like to allow only robots.txt to be accessed for searching engines, while other .txt files are restricted, I tried to add the followings to .htaccess but no luck: <Files robots.txt> Order Allow,Deny Allow from All </Files> <Files *.txt> Order Deny,Allow Deny from All </Files> Can anyone help or give me some hints? I am new comer to apache, thanks a lot.

Read the article
How to submit sitemap when your website has partial https? - Error: "Not in Domain"

- by Ralph N

My website is an ecommerce that is set up to do http for the item browsing portion, but https for things like shopping cart, contact us, etc.. (anything that has forms on it). I've submitted my website a long time ago to google webmaster tools as http://www.mywebsite.com. I also submitted a sitemap with about 40 links - 8 of them are https. I've noticed that for the longest time, google webmaster tools was reporting that 32 out of the 40 links have been crawled. I tested all the links against my robots.txt and realized that my robots text was blocking the https links. Google says those links are "Not In Domain". Is there a way i'm supposed to get around this so that I can have a hybrid-ssl site? I understand the concept that one site is mywebsite.com:80 and the other is mywebsite.com:443, but i'd like to avoid submitting and maintaining 2 seperate websites on google webmaster tools.

Read the article
How can I block abusive bots from accessing my Heroku app?

- by aem

My Heroku (Bamboo) app has been getting a bunch of hits from a scraper identifying itself as GSLFBot. Googling for that name produces various results of people who've concluded that it doesn't respect robots.txt (eg, http://www.0sw.com/archives/96). I'm considering updating my app to have a list of banned user-agents, and serving all requests from those user-agents a 400 or similar and adding GSLFBot to that list. Is that an effective technique, and if not what should I do instead? (As a side note, it seems weird to have an abusive scraper with a distinctive user-agent.)

Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >