Force request to miss cache but still store the response
- by Tom Marthenal
I have a slow web app that I've placed Varnish in front of. All of the pages are static (they don't vary for a different user), but they need to be updated every 5 minutes so they contain recent data.
I have a simple script (wget --mirror) that crawls the entire website every 15 minutes. Each crawl takes about 5 minutes. The point of the crawl is to update every page in the Varnish cache so that a user never has to wait for the page to generate (since all pages have been generated recently thanks to the spider).
The timeline looks like this:
00:00:00: Cache flushed
00:00:00: Spider starts crawling to update cache with new pages
00:05:00: Spider finishes crawling, all pages are updated until 1:15
A request that comes in between 0:00:00 and 0:05:00 might hit a page that hasn't been updated yet, and will be forced to wait a few seconds for a response. This isn't acceptable.
What I'd like to do is, perhaps using some VCL magic, always foward requests from the spider to the backend, but still store the response in the cache. This way, a user will never have to wait for a page to generate since there is no 5-minute window in which parts of the cache are empty (except perhaps at server startup).
How can I do this?