Command-line HTTP crawler for Windows?
- by Pekka
Would somebody have a recommendation for a web site crawler that can be invoked and equipped with settings from the command line?
This would need to run in a Windows environment.
Saving the data, following stylesheet links etc. is not an issue. I only need the crawler to start with a page, parse it, and follow all the links on the same domain so that in the end, all pages on the site have been requested once.
Background: I'm setting up a web site that gets frequently uploaded from an office location. Combining data from various sources, it has several levels of caching. I don't want the first user to visit the site after a fresh upload to have to wait until the page has been generated and saved in the cache.