crawlers - Page 5 - Developer IT

301 Redirects for regional variants of a homepage

- by Adam Jenkin

I am planning on implementing a website which has regional homepage variants. For Example: mycompany.com/europe mycompany.com/us The rest of the site is region agnostic and content will continue such as: mycompany.com/news mycompany.com/about-us etc For homepage (.com) requests, I plan on redirecting users to the correct homepage variant (via 301). If I cannot determine the correct one, I will fallback to redirecting them to the US homepage (/us). From an SEO point of view, firstly is this ok? or should I be doing anything additional to this for making search engines aware of the regional differences? As crawlers are region agnostic, I plan on directing them to the US page with a 301, or should I have something on the .com page which they use? Being that the regional homepage's will likely be the most visited pages, they should show up in result sitelinks when searching for mycompany (which I think is a good thing). Apologies for the slightly open question - I know anything SEO related is more opinion/best practice than fact but am purely looking for advice.

Read the article

How do I deal with content scrapers? [closed]

- by aem

Possible Duplicate: How to protect SHTML pages from crawlers/spiders/scrapers? My Heroku (Bamboo) app has been getting a bunch of hits from a scraper identifying itself as GSLFBot. Googling for that name produces various results of people who've concluded that it doesn't respect robots.txt (eg, http://www.0sw.com/archives/96). I'm considering updating my app to have a list of banned user-agents, and serving all requests from those user-agents a 400 or similar and adding GSLFBot to that list. Is that an effective technique, and if not what should I do instead? (As a side note, it seems weird to have an abusive scraper with a distinctive user-agent.)

Read the article

CDN virtual subdomain causes duplicated content

- by user3474818

I have created a subdomain and a CNAME record which points to the domain root. The subdomain www.static.example.com is actually a copy of the entire website www.example.com and it is supposed to act as an CDN and serve static content in order to improve speed. However, all of my content can be accessed via subdomain aswell, so Google has indexed it all and now I am dealing with duplicated content. How could I deny access to crawlers for the subdomain baring in mind that I do not have different subfolder for subdomain, so I can't create a separate robots.txt file?

Read the article

Crawling for geotagged data

- by abe3

I have no experience with web crawlers -- but I know that Apache maintains an open source web crawler called "Lucene." How would I go about writing such a crawler to search the web for geo tagged data close to a particular location? What would a general road map look like? How do I pick which slice of the web to crawl? Do I use regular expressions to find things that look like longitudes and latitudes? What does a general sketch of that solution look like?

Read the article

Adding a list of "recent articles" affects SEO

- by Groo

We have a site which has a sidebar with sections (or "widgets" if you like) showing stuff like "Recent Articles", "Other Articles by this User", "Similar Articles" etc. The issue is, Google seems to take these links very seriously. In fact, if I have only a single article which is closely related to a certain phrase (and several other pages link to it in their sidebars), when I do a Google search, it lists all those other pages highlighting that one link to the page that should actually be the most relevant one. And these pages don't even mention the phrase anywhere else. It there a common approach with adding these sidebar links? For example, I might add them through ajax after the page is loaded, but then crawlers will have harder time finding them?

Read the article

If my URL's are static, but then parsed by Javascript, does that make it crawlable?

- by Talasan Nicholson

Lets say I have a link: <a href="/about/">About Us</a> But in Javascript [or jQuery] catches it and then adds the hash based off of the href attribute: $('a').click(function(e) { e.preventDefault(); // Extremely oversimplified.. window.location.hash = $(this).attr('href'); }); And then we use a hashchange event to do the general 'magic' of Ajax requests. This allows for the actual href to be seen by crawlers, but gives client-side users with JS enabled an ajax-based website. Does this 'help' the general SEO issues that come along with hashtags? I know hashbangs are 'ok', but afaik they aren't reliable?

Read the article

How do I optimize SEO in a multiblog WordPress install?

- by user35585

We are about to launch two product pages plus a corporate website. The goal is to keep a blog in all of the sites, but here it comes the question about how to do it in a way we get everything unified but do not mess with Google's web crawlers. We considered the following options: Putting a blog from which we retrieve two categories with custom CSS, so we have a blog that sub splits two category-dependent blogs; this way we can get the feeds and will point to it Putting two product blogs of which we retrieve their posts into a bigger, corporate blog Putting three independent blogs Despite I was for the first option, so we only have to address our content from the product pages, I would sincerely like to hear your opinion. We are afraid duplicate content or strange link games may make us lose PageRank. How would you do it?

Read the article

Getting a lot of '/_' errors from webmaster tools

- by Vermino

I'm using a WordPress site and I thought I got all the kinks out of it. For some reason Webmaster Tools is crawling my website and showing a lot of 404 errors which are from /_ like additional pages that I've never created. I just can't figure out what is creating these for Google crawlers and then displaying a 404. My robots.txt is here. My sitemap (created by the Yoast plugin) is here. I have Yoast and Jetpack plugins installed. What could be causing these links to appear

Read the article

Space in img:s "ALT" attribute good/bad for search engines?

- by Camran

I am trying to make it easier for search engines to crawl my website, as it is almost 100% dynamic. I have a couple of transparent images which are actually links to sections of my page. I wonder, if I add an "alt" attribute containing space characters to explain the target, will this improve SE rankings etc? For example: <img src="blabla.png" alt="post new classified"> Or will this just result in errors? Ànd, what should I put in the alt attribute if I can't use space? PS: Another different and short question, will javascript-rich content make a page less important to crawlers? Thanks

Read the article

How to make a jar file run on startup & and when you log out?

- by RanZilber

I have no idea where to start looking. I've been reading about daemons and didn't understand the concept. More details : I've been writing a crawler which never stops and crawlers over RSS in the internet. The crawler has been written in java - therefore its a jar right now. I'm an administrator on a machine that has Ubuntu 11.04 . There is some chances for the machine to crash , so I'd like the crawler to run every time you startup the machine. Furthermore, I'd like it to keep running even when i logged out. I'm not sure this is possible, but most of the time I'm logged out, and I still want to it crawl. Any ideas? Can someone point me in the right direction? Just looking for the simplest solution.

Read the article

SEO Tips - Updating Your Content and Avoiding Duplicate Filters

If you are just getting into Search Engine Optimization, it's important that you are aware that content is the most important thing in your website in order to get high ratings. If you decide to hire an SEO Consultant, make sure that you explain to them exactly what you want your website to look like and the content that you would like it to feature. Having fresh and relevant content in your website is the key as that will bring web crawlers back frequently. There are different ways of achieving this while avoiding getting your website removed off of a search engine due to duplicate content.

Read the article

How can I allow search engines to index my invite only website in ruby on rails?

- by tstyle

I have a ruby on rails website that will be in invite-only mode for the next couple of months. Currently I have it set up so visits to any page performs an authentication: before_filter :authenticate, :except => [:beta] //authenticate checks for a logged in user But the webpage has a lot of content that I would like to see indexed by search engines, and I was wondering if there's an easy way to allow crawlers to do their work? I am not very knowledgable on SEO related stuff at all, so sorry if this is an suboptimal way to phrase the question.

Read the article

How to improve a single-paged site search result [closed]

- by Trisism

Possible Duplicate: How to SEO a Single-Page website I created an online CV of mine a couple of weeks ago and it has had quite a few visits. Now I want to improve the chance it will appear in google search results; however, my web CV is a one-paged site and it contains only internal links (those with hash #) so I can't really submit a sitemap. I could have changed the internal links to normal links to be processed on server-side, but there's no point of doing so. I'm very new to web SEO so I would really appreciate if somebody can show me what should I do with a single-paged site with internal links to be effectively indexed by crawlers.

Read the article

How Does WordPress Blog Search Engines?

- by Sarfraz

Hello, If you go to wordpress admin-settings-privacy, there are two options asking you whether you want to allow your blog to be searched though by seach engines and this option: I would like to block search engines, but allow normal visitors How does wordpress actually block search bots/crawlers from searching through this site when the site is live?

Read the article

RDFa / Microformat - Recipe markup standards

- by hfidgen

I wonder if anyone can help? After Google announced that it will take note of RDFa / Microformats for online recipes, I've been looking into this for a couple of recipe based sites I run. However we simply don't have all the required data to fulfill any of the standards. Does this matter? Will search engine crawlers still make the most of what they do find, or by missing a few elements (like a review or recipe rating) will I be wasting my time implementing this? Cheers, H

Read the article

Malicious crawler blocker for ASP.NET

- by Marek

I have just stumbled upon Bad Behavior - a plugin for PHP that promises to detect spam and malicious crawlers by preventing them from accessing the site at all. Does something similar exist for ASP.NET/ASP.NET MVC? I am interested in blocking access to the site altogether, not in detecting spam after it was posted.

Read the article

How Does WordPress Block Search Engines?

- by Sarfraz

Hello, If you go to wordpress admin and then settings-privacy, there are two options asking you whether you want to allow your blog to be searched though by seach engines and this option: I would like to block search engines, but allow normal visitors How does wordpress actually block search bots/crawlers from searching through this site when the site is live?

Read the article

PHP create image to display email address?

- by Teddkl

Using PHP: How can I create an image to display an email address (to help reduce spam)? Meaning, if I want to display "[email protected]" on my web page, since crawlers can easily find that text - I want to display the email address as an image that says "[email protected]". How do I do that? I want the font to be: color: #20c font: 11px normal tahoma

Read the article

How should I interpret site analytics with 11 pageviews in an 3 second visit?

- by Juank

I'm using google analytics and recently i've noticed some weird trends going on. I have a lot of visits that last mere seconds but mark several page views... more than a normal human can see in that range of time. A specific case is that the only visitor from Ireland i've had until now recorded 11 pageviews in a 3 second visit. Are these crawlers? Shouldn't google analytics filter those out?

Read the article

Can I tell sitecrawlers to visit a certain page?

- by Ace

Hi there! I have this drupal website that revolves around a document database. By design you can only find these documents by searching the site. But I want all the results to be indexed by Googlebot and other crawlers, so I was thinking, what if I make a page that lists all the documents, and then tell the robots to visit the page to index all my documents..? Is this possible, or is there a better way to do it?

Read the article

How to efficiently permanently redirect 150.000 images?

- by Fabio Spampinato

For SEO purposes I need to rename around 150.000 images, then I'd like to permanently redirect the previous url locations requests to the new locations. The current url to every image is something like: website.com/something/unique_id/filename.jpg And I want to redirect them to: website.com/something/unique_id/new_filename.jpg I can only think about 2 options: 1) Create an enormous list of redirects to include into my nginx's conf file. 2) Redirect those requests to something like "website.com/new_location/unique_id" that will redirect the request again to the new path. There are other, better, options? Should I avoid multiple 301 redirects? Will crawlers downgrade my rankings because of multiple redirects?

Read the article

Should I Use PHP as FastCGI?

- by Synetech inc.

Hi, I am running an Apache webserver on my Windows machine. It is not generally a public server (most of the little bit of traffic comes from the machine itself, and most of the public traffic comes from crawlers). Basically, it is mostly just for use as a test-bed, development system. I have read about how running PHP as FastCGI is better (ie faster and more stable) than as an Apache module. However, I really don’t like the idea of multiple PHP.exe processes (I don’t like that Apache has two processes and I’m not even too thrilled with Chromium’s multi-process model). So I’m wondering if it would be worthwhile to change PHP to FastCGI for this scenario. If it is, how would I configure it? Pretty much all of the information I have seen has been either for non-Windows or for IIS. As I said, I’m running Windows+Apache. Thanks a lot.

Read the article

Apache VERY high page load time

- by Aaron Waller

My Drupal 6 site has been running smoothly for years but recently has experienced intermittent periods of extreme slowness (10-60 sec page loads). Several hours of slowness followed by hours of normal (4-6 sec) page loads. The page always loads with no error, just sometimes takes forever. My setup: Windows Server 2003 Apache/2.2.15 (Win32) Jrun/4.0 PHP 5 MySql 5.1 Drupal 6 Cold fusion 9 Vmware virtual environment DMZ behind a corporate firewall Traffic: 1-3 hits/sec avg Troubleshooting No applicable errors in apache error log No errors in drupal event log Drupal devel module shows 242 queries in 366.23 milliseconds,page execution time 2069.62 ms. (So it looks like queries and php scripts are not the problem) NO unusually high CPU, memory, or disk IO Cold fusion apps, and other static pages outside of drupal also load slow webpagetest.org test shows very high time-to-first-byte The problem seems to be with Apache responding to requests, but previously I've only seen this behavior under 100% cpu load. Judging solely by resource monitoring, it looks as though very little is going on. Here is the kicker - roughly half of the site's access comes from our LAN, but if I disable the firewall rule and block access from outside of our network, internal (LAN) access (1000+ devices) is speedy. But as soon as outside access is restored the site is crippled. Apache config? Crawlers/bots? Attackers? I'm at the end of my rope, where should I be looking to determine where the problem lies?

Read the article

What steps should I follow to start developing website applications?

- by Oscar Mederos

Hello, I've been developing desktop applications for about 4 years, using .NET, C++, C, and a little of Python. I've covered lots of topics while developing my applications, and even web technologies (cookies, GET/POST methods, when programming some scrapers/crawlers). I've been always waiting to start developing websites, preferably using PHP + MySQL, although other advises will be welcomed to make this question more useful and generic for others. I know I could use a CMS instead of starting from scratch, but sometimes I don't need an entire CMS to do minor things... What steps should I follow to create a website? Let's suppose I have a web designer. First of all, the designer designs the entire website (CSS, etc) and then I do the programming stuffs, like loading dynamically things from databases, doing some client-side stuffs with javascript, etc? Or how is the best way to do it? Edit: I'm not looking for tools/frameworks/languages suggestions. What I want to know is how a team (or a developer with a designer) starts creating a website. The steps they do, what tasks they do first, how they integrate the work, etc. An example of an answer could be: 1) Design the entire website with good CSS practices, using containers instead of tables in some cases, etc. 2) Use that design and develop the logic or the functionalities of the website. Of course, that's just an example. I'm looking for a good way to approach it, because I've been wanting to start on it but don't really know how exactly to organize the job :/

Read the article

Why is <my site url> not indexed by search engines? [closed]

- by Henrik Erlandsson

was indexed fine until about a year ago. The only thing I can think of is that search engines throw up at using h5 before h4, or that some person (fantasizing now) has reported my site as unsafe to every search engine. However, I'm not here to speculate. The site validates, and has an RSS feed on the front page, for Pat Morita's sake! To me, it looks like the kind of site search engines would feast on. It's got more than a dozen blogs on it, if nothing else. Hah. :) I was thinking you could identify basically what has changed in search engines (currently, google, yahoo, bing which used to work fine) the last year to make them not find news and blog articles on this site. The site was submitted to Google, oh, way back in 2006. With online crawler tests I get mixed results, some crawlers index fine, some go blank. I don't really know which ones are reliable and am looking to you guys for advice on that. Yes, I am prepared to again verify my site with Google and upload a sitemap, but that's not the topic here. I really would first like to know what change on the site last year could make search engines not index it. (Yees, the robots.txt is fine. Should be nothing to discourage bots there.) It's a very intriguing problem. One which I have yet to find the reason for but would like to know the reason for. Any and all input appreciated, but I would heavily enjoy pertinent advice the most. ;) Edit: Some google searches that don't show up include - aca630 All of which are posted in the news and blogs that are on the front page there. Now, these search terms are extremely specific as the term in is almost unique on the web and ACA630 is also a very qualified search term that can't be confused with mainstream search terms.

Search Results

Search found 149 results on 6 pages for 'crawlers'.

Page 5/6 | < Previous Page | 1 2 3 4 5 6 | Next Page >

- by Adam Jenkin

- by aem

- by user3474818

- by abe3

- by Groo

- by Talasan Nicholson

- by user35585

- by Vermino

- by Camran

- by RanZilber

- by tstyle

- by Trisism

- by Sarfraz

- by hfidgen

- by Marek

- by Sarfraz

- by Teddkl

- by Juank

- by Ace

- by Fabio Spampinato

- by Synetech inc.

- by Aaron Waller

- by Oscar Mederos

- by Henrik Erlandsson

< Previous Page | 1 2 3 4 5 6 | Next Page >