What is a good Java web crawler library?
Posted
by DrDee
on Stack Overflow
See other posts from Stack Overflow
or by DrDee
Published on 2010-03-22T19:58:49Z
Indexed on
2010/05/16
17:50 UTC
Read the original article
Hit count: 307
Hi,
I am about to develop a crawler in Java but don't feel like reinventing the wheel. A quick Google search gives a whole bunch of Java libraries to build a web crawler. Besides that Nutch is of course a very robust package but seems a bit too advanced for my needs. I only need to crawl a handful websites a week containing a couple of 1000 pages each.
Which open source Java library would you recommend considering:
- speed
- multithreading (or even distributed)
- extending it with new functionality
- active maintained
- and documentation?
© Stack Overflow or respective owner