Search Results

Search found 261 results on 11 pages for 'crawler'.

Page 7/11 | < Previous Page | 3 4 5 6 7 8 9 10 11 | Next Page >

How can I index content within a Content Editor web part?

- by Hirvox

I'm using MOSS 2007 v12.0.0.6529, and the the Shared Services crawler is ignoring content inside Content Editor Web Parts. The page itself is a Publishing page, and content within the Page Content field is indexed properly and shows up in search results. How can I ensure that content within Content Editor webparts is also indexed? Or do I have to…

Read the article
Why is my site not on Google? [closed]

- by RD

I wanted to post a link here, but some people might see that as advertising. So, instead I'm going to phrase my question like this: What can I do, to make sure my site appears on Google? I have already done the following: Submitted my sitemap Added my site at www.google.com/addurl Added Analytics to my site Checked in the webmaster tools if…

Read the article
What can i use as a 3d Tile map editor?

- by alfa64

I need to make grid based levels with 3d models for a dungeon crawler ( as a recent example Legend of Grimrock), but i need to have several layers and place entities with properties and position, angle, etc. I was considering Tiled, using layers as height for each level, but it's very hard to work with and visualize. What can i use for this…

Read the article
Understanding Ajax crawling of search site

- by vacuum

I have a couple of questions about Ajax crawling of site, which is kind of search engine itself. The base article explains the mechanism of making AJAX application crawlable. All this stuff with HTML-snapshots is clear and easy to implement, but I cant understand where will Google bot will get "the crawler finds a pretty AJAX URL"( ie…

Read the article
sqlite3 timestamp (current_timestamp) one hour off

- by Eiriks

I run a small crawler on a virtual ubuntu server, initiated by crontab hourly. Datetime is inserted by defaulting the date filed to TIMESTAMP DEFAULT CURRENT_TIMESTAMP. Table creation looks like this: CREATE TABLE links (page TEXT, link TEXT, date TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY(page,link)); The datetime gets…

Read the article
Is content in option tags indexed?

- by Silfverstrom

Is data inside an <option> tag indexed? For example, would the following option tag allow "Volvo", "Saab", "Opel" and "Audi" to be indexed by a crawler? <select> <option value="volvo">Volvo</option> <option value="saab">Saab</option> <option value="opel">Opel</option> …

Read the article
Do search engines directly penalize bad grammar?

- by Nicolas Raoul

Let's say I have a web page with user-contributed content, which is good content but with bad grammar, slang terms, inappropriate tone. I know that bad grammar is a also a problem because it drives away visitors and scares people from linking to it, but let's put that aside. Let's also put aside the fact that incorrectly…

Read the article
HTTP 303 redirection and robots.txt

- by Ian Dickinson

On a site I'm working on, we're using the HTTP 303 redirect pattern (see this article for background) to distinguish between information and non-information resources. So: some URL's under /id get redirected to dynamically-created pages under /doc. These dynamic pages are built from a database, and contain links to other…

Read the article
Webmaster Tools word count

- by Henrik Erlandsson

Is there a way to somehow verify that the googlebot finds the headings and the content, for example by word count? I'm asking this because I tried a program called Screaming Frog, which fails to even fetch the first h1 on a validated page - for about 1/3 of all the pages(!) - and got insecure. Even though the site looks…

Read the article
Foolproof way to ensure Google news pulls the correct image for it's thumbnails?

- by Anthony

Google news results have an acompanying thumbnail next to articles that show up in the results. If google's crawler can't find a thumbnail to pull from our site, it uses its next best guess from another site, therefore linking the image to another site but still uses our headline. Example: Headline from Reuters,…

Read the article
How to disallow indexing but allow crawling?

- by John Doe

In the front page of my website, I have some previews to articles (with a small introduction to them) that link to the full articles. I want to disallow the front page to prevent duplicate content. But if I do this (in robots.txt), would it still be crawled? I mean, the full articles would be still reached by the…

Read the article
How to fix Google 404 not found Crawl Errors?

- by Freeme

I was checking on Google webmater tool for my blog site to see if there's any indication on why my blog traffic decreased to half in one day and i saw 43 Not Found crawl errors and 5 in Sitemap Not Found errors. The 5 Not Found errors in Sitemap were the links to categories. I guess I renamed categories that's why…

Read the article
How to allow Google Images search to by pass hotlink protection?

- by Marco Demaio

I saw Google Images seems to index my images only if hotlink protection is off. * I use anyway hotlink protection because I don't like the idea of people sucking my bandwidth, i simply this code to protcet my sites from being hotlinked: RewriteEngine on RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER}…

Read the article
Directing crawlers to content in language per language sub-domain

- by Noam

I have a site with multilingual website with many pages (40M). The site has UGC, and each translation is actually for the titles. Each sub-domain points to the same content with different titles per language. As far as I understand, each sub-domain should be indexed by search engines, meaning they will actually…

Read the article
What does Enable/Disable mean in Bing's URL Normalization feature?

- by DisgruntledGoat

I'm in Bing Webmaster Tools, under Index URL Normalization. Many parameters are listed in the table with 3 other columns: Status, Source, Date. The "Source" column says "Webmaster" where I have added parameters, and "Bing" where I assume the parameter has been auto-detected. "Date" is probably the last date it…

Read the article
Google Analytics - Traffic Source - Search engine - (Not Provided)

- by Dharmavir

I am using Google Analytics, now here when I go to "Traffic Source Overview" under that it shows Keyword as "(Not provided)" which is almost 40% of my traffic source. Now more than 90% of search engine traffic is from Google and still out of that for more than 40% of keywords are "(Not provided)". Can anyone…

Read the article
Disqus thread migration. Gotchas?

- by sramsay

I've been migrating a site to a new domain. The site itself is pretty straightforward (it uses Jekyll), and everything has gone fine -- except migration of Disqus threads. I've had partial success -- some of the threads have migrated successfully, but not all. I've tried the domain migration wizard (which…

Read the article
How to get rid of crawling errors due to the URL Encoded Slashes (%2F) problem in Apache

- by user14198

The Google web crawler has indexed a whole set of URLs with encoded slashes (%2F) for our site. I assume it has picked up the pages from our XML sitemap file. The problem is that the live pages will actually result in a failure because of the Url Encoded Slashes Problem in Apache. Some solutions are…

Read the article
How to optimize my PageRank calculation?

- by asmaier

In the book Programming Collective Intelligence I found the following function to compute the PageRank: def calculatepagerank(self,iterations=20): # clear out the current PageRank tables self.con.execute("drop table if exists pagerank") self.con.execute("create table pagerank(urlid…

Read the article
Equivalent of libwww-perl in .NET or Java

- by voidvector

I have written a crawler in Perl awhile back and it was super simple giving the high-level capability of libwww-perl. It is so straight forward in fact, it can take the raw HTML response of one request, and create the next HTTP request for you from the FORMs on that page (as in it will parse…

Read the article
Unhandled Exception in c#

- by nightcoder1

Hello i am currently trying to run a web crawler through the terminal. it compiles fine and the debug does not find any errors, however i get the following error which i do not understand. any ideas on how to get rid of this error would be much appreciated Unhandled Exception:…

Read the article
SEO & Ajax

- by cloudhead

I'm experimenting with building sites dynamically on the client side, through javascript + a json content server, the js retrieves the content, and builds the page client-side. Now, the content won't be indexed by google this way, is there a work around for this? like having a…

Read the article
Oracle Secure Enterprise Search(SES) Intranet crawling problem.

- by vipin k.

I am using oracle Oracle Secure Enterprise Search(SES), and using the crawler to crawl the Intranet site. but i am getting the error. EQG-30008: http://site-name/: Not found I have added the Log on password and user name and also added the proxy settings. Any body who worked…

Read the article
file_get_contents VS CURL, what has better performance?

- by ahmed

I am using PHP to build a web crawler, to crawl millions of URLs, what is better for me in terms of performance?file_get_contents or CURL? Thanks

Read the article
Valid content-type for XML, HTML and XHTML documents

- by astropanic

What are correctly content-types for this documents ? I need to write a simple crawler, that only fetches this kind of files. Nowadays http://somedomain.com/index.html can serve for example an JPEG file due to mod_rewrite, so I need to check the content-type from the…

Read the article

< Previous Page | 3 4 5 6 7 8 9 10 11 | Next Page >