Nutch - how to crawl by small patches?
Posted
by Yurish
on Stack Overflow
See other posts from Stack Overflow
or by Yurish
Published on 2010-03-29T12:40:01Z
Indexed on
2010/03/29
12:43 UTC
Read the original article
Hit count: 430
Hi everyone!
I am stuck! Can`t get Nutch to crawl for me by small patches. I start it by bin/nutch crawl command with parameters -depth 7 and -topN 10000. And it never ends. Ends only when my HDD is empty. What i need to do:
- Start to crawl my seeds with possibility to go further on outlinks.
- Crawl 20000 pages, then index them.
- Crawl another 20000 pages, index them and merge with first index.
- Loop step 3 n times.
Tried also with scripts found in wiki, but all scripts i found don't go further. If i run them again, they do everything from beginning. And in the end of script i have the same index i had, when started to crawl. But, i need to continue my crawl.
Some help would be very usefull!
© Stack Overflow or respective owner