What techniques can be used to detect so called "black holes" (a spider trap) when creating a web crawler?
- by Tom
When creating a web crawler, you have to design somekind of system that gathers links and add them to a queue. Some, if not most, of these links will be dynamic, which appear to be different, but do not add any value as they are specifically created to fool crawlers.
An example:
We tell our crawler to crawl the domain evil.com by entering an…