search engine crawling frequency
Posted
by
Aditya Pratap Singh
on Stack Overflow
See other posts from Stack Overflow
or by Aditya Pratap Singh
Published on 2012-05-31T22:24:39Z
Indexed on
2012/05/31
22:40 UTC
Read the original article
Hit count: 306
web-development
|search-engine
I want to design a search engine for news websites ie. download various article pages from these websites, index the pages, and answer search queries on the index.
I want a short pseudocode to find an appropriate crawling frequency -- i do not want to crawl too often because the website may not have changed, and do not want to crawl too infrequently because index would then be out of date. Assume that crawling code looks as follows
while(1) {
sleep(sleep_interval); // sleep for sleep_interval
crawl(website); // crawls the entire website
diff = diff(currently_crawled_website, previously_crawled_website); // returns a % value of difference between the latest and previous crawls of the website
sleep_interval = infer_sleep_interval(diff, sleep_interval);
}
looking for a pseudocode for the infer_sleep_interval method:
long sleep_interval infer_sleep_interval(int diff_percentage,long previous_sleep_interval)
{
...
...
...
}
i want to design method which adaptively alters the sleeping interval based on the update frequency of the website.
© Stack Overflow or respective owner