Command-line HTTP crawler for Windows?
Posted
by Pekka
on Super User
See other posts from Super User
or by Pekka
Published on 2010-05-24T16:33:34Z
Indexed on
2010/05/24
16:41 UTC
Read the original article
Hit count: 354
Would somebody have a recommendation for a web site crawler that can be invoked and equipped with settings from the command line?
This would need to run in a Windows environment.
Saving the data, following stylesheet links etc. is not an issue. I only need the crawler to start with a page, parse it, and follow all the links on the same domain so that in the end, all pages on the site have been requested once.
Background: I'm setting up a web site that gets frequently uploaded from an office location. Combining data from various sources, it has several levels of caching. I don't want the first user to visit the site after a fresh upload to have to wait until the page has been generated and saved in the cache.
© Super User or respective owner