Do not filter outlinks in Nutch?
- by sigpwned
I'm currently trying to perform a deep crawl within a small list of sites. To accomplish this, I updated conf/domain-urlfilter.txt with the domains of the sites I wish to scrape, which worked nicely. However, I found that not only were the links crawled at every step filtered, but the outlinks captured from each page crawled were filtered as well.
Is there a way to avoid filtering captured outlinks while still filtering crawled URLs?