Search Results

Search found 30 results on 2 pages for 'nutch'.

Page 2/2 | < Previous Page | 1 2

.Net search engine architecture and technology choice

- by shrivb

I am in the process of designing a search engine for an asp.net site. The site currently uses Microsoft Indexing Server to index and search content which range from simple text files to MS documents to PDFs. MIS is also used to crawl File servers. MIS in tandem with Index Server Companion crawls for content from external sites. I intend to replace…

Read the article
What is a good Java crawler library?

- by DrDee

Hi, I am about to develop a crawler in Java but don't feel like reinventing the wheel. A quick Google search gives a whole bunch of Java libraries to build a web crawler. Besides that Nutch is of course a very robust package but seems a bit too advanced for my needs. I only need to crawl a handful websites a week containing a couple of 1000 pages…

Read the article
What is a good Java web crawler library?

- by DrDee

Hi, I am about to develop a crawler in Java but don't feel like reinventing the wheel. A quick Google search gives a whole bunch of Java libraries to build a web crawler. Besides that Nutch is of course a very robust package but seems a bit too advanced for my needs. I only need to crawl a handful websites a week containing a couple of 1000 pages…

Read the article
How to normalize a URL in Java?

- by dfrankow

URL normalization (or URL canonicalization) is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized or canonical URL so it is possible to determine if two syntactically different URLs are equivalent. Strategies include lowercasing, adding…

Read the article
Can a raw Lucene index be loaded by Solr?

- by wynz

Some colleagues of mine have a large Java web app that uses a search system built with Lucene Java. What I'd like to do is have a nice HTTP-based API to access those existing search indexes. I've used Nutch before and really liked how simple the OpenSearch implementation made it to grab results as RSS. I've tried setting Solr's dataDir in…

Read the article

< Previous Page | 1 2