What is a good Java web crawler library?
- by DrDee
Hi,
I am about to develop a crawler in Java but don't feel like reinventing the wheel. A quick Google search gives a whole bunch of Java libraries to build a web crawler. Besides that Nutch is of course a very robust package but seems a bit too advanced for my needs. I only need to crawl a handful websites a week containing a couple of 1000 pages each.
Which open source Java library would you recommend considering:
speed
multithreading (or even distributed)
extending it with new functionality
active maintained
and documentation?