Nutch - how to crawl by small patches?
- by Yurish
Hi everyone!
I am stuck! Can`t get Nutch to crawl for me by small patches. I start it by bin/nutch crawl command with parameters -depth 7 and -topN 10000. And it never ends. Ends only when my HDD is empty. What i need to do:
Start to crawl my seeds with
possibility to go further on
outlinks.
Crawl 20000 pages, then
index them.
Crawl another 20000
pages, index them and merge with
first index.
Loop step 3 n times.
Tried also with scripts found in wiki, but all scripts i found don't go further. If i run them again, they do everything from beginning. And in the end of script i have the same index i had, when started to crawl. But, i need to continue my crawl.
Some help would be very usefull!