Search Results

Search found 4984 results on 200 pages for 'robots txt'.

Page 68/200 | < Previous Page | 64 65 66 67 68 69 70 71 72 73 74 75 | Next Page >

Removing existing filtered pages from Google's index: noindex / 301 / canonical to non-filtered page?

- by Noam

I've decided to remove some of my site's pages from the Google index to focus more of the indexed pages on higher quality pages. The pages I'm going to remove are already in the index. These removed pages are filtered pages which will continue to exist, I just don't want them in the google index because they add little quality to the same page without any filter selected. I've added in webmaster tools specification of narrow for the parameters that set these filters, but it doesn't seem this changes anything in how he handles these pages. So I'm considering three options: Adding <meta name="robots" content="noindex" /> to the html header of these filtered pages 301 to the non-filtered page that contains the most similar information and will remain in the index Canonical tag. Which I'm not sure is exactly the mainstream use case, as these aren't really the same pages. Which should I use?

Read the article
Does a "nofollow" attribute on a link prevent URL discovery by search engines?

- by Stephen Ostermiller

I know that nofollow prevents link juice from being passed across a link. But if search engine robots discover a link with a nofollow on it, will they add that link to their crawl queue? In other words, if I create a link to a brand new page and put a rel=nofollow attribute on that link, will it prevent search engine bots (particularly Googlebot) from crawling the page. (Assuming that this link remains the only link into that page.) I've read conflicting reports about this over the years and I'm looking for authoritative references about the current state of affairs. Official statements from Google or published results of independent testing would be ideal.

Read the article
RewriteRule not working at server level?

- by Alexis Wilke

I wanted to forbid some robots from doing certain things to my websites and decided to add a RewriteRule for that purpose. The rule works when put in one of my <VirtualHost *:80> tag and looks like this: RewriteEngine On RewriteCond %{HTTP_USER_AGENT} libwww-perl RewriteCond %{REQUEST_METHOD} POST RewriteRule . - [F,L] However, I wanted to apply that to all my websites instead of just one of them. So with the newest version of Apache2 settings, I decided to put that code in the security.conf file. This file is defined under /etc/apache2/conf-available/... (and yes, I have a softlink from the /etc/apache2/conf-enabled/... directory.) However, if the definition is only in the conf-available/security.conf files, it somehow gets ignored. From the documentation, it says that these Rewrite* commands all work at server level! Any idea of what I would be missing?

Read the article
Maker Faire 2012 Attendees build with Java Technology

- by hinkmond

Looks like Daniel Green, systems engineer from Oracle, and the panel of Java experts had a successful Java Technology booth at this year's Maker Faire 2012. See: Maker Faire 2012 adds Java Here's a quote: "We made a huge impact for Java and Oracle, creating positive perception, building brand awareness, and introducing fun and engaging ways for future technologists to learn Java programming," says Michelle Kovac, Oracle director, Java Marketing and Operations. Good stuff, considering all the future developers of exploding robots and fire-breathing dragon metal sculptures attend the Maker Faire. They can blow up stuff with Java technology just as effectively as other programming languages. Hinkmond

Read the article
webmaster tools - Network Unreachable

- by Jayapal Chandran

Hi, webmaster tools for my site displays that robots.txt unreachable and for all links in sitemap it says network unreachable. sitemap.xml unreachable. These appear in crawl stats page. I discussed with the support team of my hosting and they said... Hi, I have verified apache logs, i cannot see any issues on your website/webserver/ Possible issues. There may the routing issue from the googles server to our server. When a google bots hits goes high the IP will be automatically blacklisted by our firewall to avoid server loads & downtimes. As we donot have access to their services, We cannot able to give details of their details/logs etc. The sitemaps link shows an exclamation mark which means the file was not reachable. What could be the problem and how to solve it?

Read the article
Duplicate content in Top Level Domain and country specific website

- by Ando

I have myproduct.com which is my master product page. For UK I also own myproduct.co.uk which is a copy of myproduct.com with some localized content: landing page, promotions, prices, and specific tags. But there is also duplicate content: myproduct.com/FAQs/ is the same as myproduct.co.uk/FAQs/ I don't want to do a redirect from myproduct.co.uk/FAQs/ to myproduct.com/FAQs/ as I don't want people to leave the localized website. The myproduct.com/FAQs/ is my "go-to" FAQ page and it's the most likely to be up to date - so I want this page to be indexed my search engines, where as I don't care about myproduct.co.uk/FAQs/ being indexed (unless indexing this page would increase my page rank :) ). What to do now to be SEO friendly & SEO optimal? Stop indexing of myproduct.co.uk/FAQs/ via robots.txt? Do some rel="alternate" hreflang="x" configuring on both /FAQs/ page? Something else?

Read the article
Why is Google Webmaster Tools crawling invalid URLS and showing 500 errors?

- by Amos Kane

Google Webmaster tools is reporting 12k+ 500 errors. Eeek! None of the URLS are valid- they all contain www.youtube.com. First, why is Google crawling these URLS if they don't exist? I supplied a sitemap, and they are of course not in the sitemap. I don't have a robots.txt blocking anything. I've checked for invalid redirects--none, and checked for unclosed tags or something that would throw www.youtube.com into the URL by accident--none. In every 'linked from', the referring URL is also a bad URL, with www.youtube.com in it. The Google Tools report no malware, and I can't check the server logs because the host won't give me access. Really stuck!! Any ideas appreciated!

Read the article
How did craigspro license Craigslist content? [closed]

- by Joshua Frank

There's an app called craigspro that provides a much better interface to Craigslist on mobile devices. They claim that the app is Officially Licensed by Craigslist, but I thought Craigslist never licensed their content, and the only thing I can find on the subject in the terms of use is this: Any copying, aggregation, display, distribution, performance or derivative use of craigslist or any content posted on craigslist whether done directly or through intermediaries (including but not limited to by means of spiders, robots, crawlers, scrapers, framing, iframes or RSS feeds) is prohibited. As a limited exception, general purpose Internet search engines and noncommercial public archives will be entitled to access craigslist without individual written agreements executed with CL that specifically authorize an exception to this prohibition if ... Does anyone know how do get a "written agreement" with Craigslist, and roughly what their terms would be? Do they charge a fee, or just check that you're not evil? I'll try next with Craigslist directly, but I'd like to get a sense of the landscape before stumbling in.

Read the article
Why is Google not indexing my forum?

- by jsoldi

I have this web site that as you can see on the top right has is a "Forums" button that links to my forums. My problem is that if, for instance, I make a Google search "site:heliumscraper.com answers" I don't get any result, even though the word "answers" is in the forum's board index and it has been there for a few months already. I don't have a robots file. I also uploaded a sitemap to Google that contained the forum. I'm thinking the problem might be the fact that the only link from my main page to the forum opens in a new window. Could this be the problem?

Read the article
Inspirational software for end-users written in Haskell?

- by Lenny222

I think great technology ist invisible. Besides the usual suspects (GHC, Xmonad, proprietary trading software) what great examples are there for end-user software written in Haskell? I think good examples are FreeArc, Hledger and "Nikki And The Robots". Do you have more examples (full blown GUI apps, small CLI tools, etc)? Edit: For example i am fascinated by Wings3D, because while written in Erlang, users can not tell that. It just works. Among Haskell's weak spots are cross-platform GUIs. There are not many GUI apps written in Haskkel in general and most of them are no easy to use, install or even compile. What are good examples to learn from how to make hard things look easy?

Read the article
What makes for the ideal project? [closed]

- by Hans Westerbeek

I try to be careful when accepting assignments, to avoid mutual disappointment. So, I started to come up with a list of things that I consider ingredients for The Ideal Project: (in no particular order) What did I miss? What did I get wrong? Team size < 6 persons to avoid having too many meetings Team members must be dedicated to the project Gut-feeling-estimate (made by developers) of running period does not exceed 4 months. Projects longer than that tend to become open-ended, and are therefore not projects. Has a Product Owner who has mandate and is well-respected at their own company and who has a real interest in the long-term success of the project. Has no technical involvement from people that are not on the team. (yes that's you, Mr Architect That Doesn't Code) All the usual about quiet working conditions Exciting subject matter. Content management is just not as cool as controlling robots :)

Read the article
Adaptive Characters: AI Solution Needs a Problem

- by Roger F. Gay

Have sophisticated adaptive programming, will travel - so to speak. I'm part of a group that developed sophisticated learning / adaptive software for robotics. The system "thinks" via its simulator, building and adapting code on its own; and then carries out the best solution. The software can also adapt to new situations, etc. http://mensnewsdaily.com/2007/05/16/robobusiness-robots-with-imagination/ It's easy to imagine using it with automated game characters that will adapt to the players moves and style - the easiest example would be fighting. The more the simulated fighter fights with the human player, the more it learns to counter that players fighting skills. But there should be more. Anyone have any ideas as to how adaptive characters might be interesting in games?

Read the article
Is this Anti-Scraping technique viable with Crawl-Delay?

- by skibulk

I want to prevent web scrapers from abusing 1,000,000 on my website. I'd like to do this by returning a "503 Service Unavailable" error code for users that access an abnormal number of pages per minute. I don't want search engine spiders to ever receive the error. My inclination is to set a robots.txt crawl-delay which will ensure spiders access a number of pages per minute under my 503 threshold. Is this an appropriate solution? Do all major search engines support the directive? Could it negatively affect SEO? Are there any other solutions or recommendations?

Read the article
How can I stop a bot attack on my site?

- by tnorthcutt

I have a site (built with wordpress) that is currently under a bot attack (as best I can tell). A file is being requested over and over, and the referrer is (almost every time) turkyoutube.org/player/player.swf. The file being requested is deep within my theme files, and is always followed by "?v=" and a long string (i.e. r.php?v=Wby02FlVyms&title=izlesen.tk_Wby02FlVyms&toke). I've tried setting an .htaccess rule for that referrer, which seems to work, except that now my 404 page is being loaded over and over, which is still using lots of bandwidth. Is there a way to create an .htaccess rule that requires no bandwidth usage on my part? I also tried creating a robots.txt file, but the attack seems to be ignoring that. #This is the relevant part of the .htaccess file: RewriteCond %{HTTP_REFERER} turkyoutube\.org [NC] RewriteRule .* - [F]

Read the article
How to use rel=canonical with Sitecore aliases?

- by Mike G

I have inherited a Sitecore architecture that is a mess from an SEO duplicate content POV. There are multiple aliases that have been created (and indexed by the search engines) for many of the 2nd tier pages of the site. Due to server issues, I am not able to 301 redirect these duped pages, so I would like to use the rel=canonical tag in an attempt to try and get Google/Bing to recognize the correct pages I would like to appear in the index. I have blocked the most extraneous duped pages with a robots.txt file, however, since Google/Bing have already spidered many of the duped pages, I need to keep them accessible to the spiders, BUT removed from the index. The catch is, since the duped pages are aliases (and don't really physically exist in Sitecore that I can find), I am not sure how to go about using rel=canonical - or if I even can in this situation..?

Read the article
Google Webmaster Tools Index dropped to Zero [closed]

- by Brian Anderson

Earlier this year I rebuilt my website using ZenCart. Immediately I saw a drop in index status from 59 to 0. I then signed up for Google Webmaster Tools and noticed the Index status took a dramatic drop and has never recovered. I have worked to add content and I know I am not done, but have not seen any recovery of this index since. What confuses me is when I look at the sitemap status under Optimization it shows me there are 1239 submitted and 1127 pages indexed. Most of my pages have fallen off page one for relevant search terms and some are as far back as page 7 or 8 where they used to be on the first page. I have made some changes in the past week to robots.txt and sitemap.xml, but have not seen any improvements. Can anyone tell me what might be going on here? My website is andersonpens.net. Thanks! Brian

Read the article
Google indexed site's address by accident. What do I do now?

- by AndrejaKo

I was making a site for a friend of mine and he wanted to be able to see my progress as I worked on the site, so I decided to put the site on a server on my computer and enable access by a domain name registered to me. It turns out that I forgot to set up a robots.txt file for the site and somehow Google indexed the site. My question is: What do I do now? As I understand it, Google doesn't like duplicate content and my friend could have problems when I upload the new site to his server. Right now his current site, which only has a work in progress page, is first on Google when searching for relevant keywords and I really really don't want to damage that. Is there anything else I need to be concerned about?

Read the article
Keeping files private on the internet (.htaccess password or software/php/wordpress password)

- by jiewmeng

I was asked a while ago to setup a server such that only authenticated users can access files. It was like a test server for clients to view WIP sites. More recently, I want to do something similar for some of my files. Tho they are not very confidential, I wish that I am the only one viewing it. I thought of doing the same, Create a robots.txt User-agent: * Disallow: / Setup some password protection, .htpasswd seems like a very ugly way to do it. It will prompt me even when I log into FTP. I wonder if software method like password protected posts in Wordpress will do the trick of locking out the public and hiding content from Search Engines? Or some self made PHP script will do the trick?

Read the article
I've changed my URL schema. How do I tell Google to index the new schema and forget the old one?

- by growse

I had a site where the urls were constructed like this /index.php/Topic /index.php/AnotherTopic These were indexed in google, and search results returned that pointed to these. However, I've recently replatformed that site, and reconfigured it so the above urls would be: /index.php?title=Topic /index.php?title=AnotherTopic The original urls are returning 404s. The site is linking to the correct URL schema internally, but Google is retaining the original schema in its search results. I've updated and resubmitted the sitemap which only contains the new schema. Also, Google's webmasters tool is going slightly bananas at the fact there's now a spike in 404 errors in its crawl results. What would be the best approach to get Google to 'forget' about the old schema, and instead index the new schema? Should I try blocking /index.php/ in robots.txt? Should I be returning 301 codes instead of 404 for the original urls?

Read the article
Skynet Big Data Demo Using Hexbug Spider Robot, Raspberry Pi, and Java SE Embedded (Part 4)

- by hinkmond

Here's the first sign of life of a Hexbug Spider Robot converted to become a Skynet Big Data model T-1. Yes, this is T-1 the precursor to the Cyberdyne Systems T-101 (and you know where that will lead to...) It is demonstrating a heartbeat using a simple Java SE Embedded program to drive it. See: Skynet Model T-1 Heartbeat It's alive!!! Well, almost alive. At least there's a pulse. We'll program more to its actions next, and then finally connect it to Skynet Big Data to do more advanced stuff, like hunt for Sara Connor. Java SE Embedded programming makes it simple to create the first model in the long line of T-XXX robots to take on the world. Raspberry Pi makes connecting it all together on one simple device, easy. Next post, I'll show how the wires are connected to drive the T-1 robot. Hinkmond

Read the article
Google I/O 2010 - Waving across the web

Google I/O 2010 - Waving across the web Google I/O 2010 - Waving across the web Wave 101 Dhanji Prasanna, Douwe Osinga This talk focuses on using the Google Wave APIs outside of the Google Wave product. We'll talk about how to take advantage of embedded waves to allow for commenting and discussions on your website, how to integrate your website with WaveThis using gadgets and robots for continued interactivity and how to use the wave data APIs to get access to wave content from your website. For all I/O 2010 sessions, please go to code.google.com From: GoogleDevelopers Views: 5 0 ratings Time: 01:00:24 More in Science & Technology

Read the article
What's the best way to add some particle or laser effects to an already animated character?

- by Scott

I just purchased some rigged and animated robot characters from 3drt for a game I'm making in unity. I would like to be able to add some weapon effects to the characters. For example, I would like for the robots to be able to shot lasers out of the hands at enemies. I have know idea where to even start with this task as I'm more of a programmer than a graphics guy. Can some experienced developers / designers please point me in a good direction? Thanks. Note: As of right now I have maya and blender installed on my computer.

Read the article
duplicate pages

- by Mert

I did a small coding mistake and google indexed my site wrongly. this is correct form: https://www.foo.com/urunler/171/TENGA-CUP-DOUBLE-HOLE but google index my site like this : https://www.foo.com/urunler/171/cart.aspx first I fixed the problem and made a site map and only correct link in it. now I checked webmaster tools and I see this; Total indexed 513 Not selected 544 Blocked by robots 0 so I think this can be caused by double indexes and they looks not selected makes my data not selected. I want to know how to fix this "https://www.foo.com/urunler/171/cart.aspx" links. should I fix in code or should I connect to google to reindex my site. If I should redirect wrong/duplicate links to correct ones, what the way should be? thanks for your time in advance.

Read the article
Moved sitemaps to a different subdomain and losing search referrals around the same time. Red herring or correlation?

- by er1234

We started to lose search referral traffic around the same time that I moved some of our sitemaps to a subdomain. Could this have hurt us? I followed Google's steps to creating a sitemap under a different subdomain. The new sitemaps.foo.com subdomain is being crawled and indexed well. Both www.foo.com and sitemaps.foo.com have been verified in Google Webmaster Tools. They appear as distinct sites. Is this correct? I can't find a way in Webmaster Tools to say "Hey, sitemaps.foo.com is really owned by www.foo.com, so show them together and make sure to attribute sitemaps.foo urls to www.foo" Our www.foo.com/robots.txt Sitemap: http://www.foo.com/sitemap.xml Sitemap: http://sitemaps.foo.com/subdir/sitemap.xml.gz

Read the article
Duplicate pages indexed in Google

- by Mert

I did a small coding mistake and Google indexed my site incorrectly. This is the correct form: https://www.foo.com/urunler/171/TENGA-CUP-DOUBLE-HOLE But Google indexed my site like this: https://www.foo.com/urunler/171/cart.aspx First I fixed the problem and made a site map with only the correct link in it. Now I checked webmaster tools and I see this: Total indexed 513 Not selected 544 Blocked by robots 0 So I think this can be caused by double indexes, and it looks like the pages not selected makes the correct pages not indexed. I want to know how to fix the "https://www.foo.com/urunler/171/cart.aspx" links. Should I fix in code or should I connect to Google to re-index my site? If I should redirect wrong/duplicate links to correct ones, how should that be done?

Read the article

< Previous Page | 64 65 66 67 68 69 70 71 72 73 74 75 | Next Page >