How to cache pages using background jobs ?
- by Alexandre
Definitions: resource = collection of database records, regeneration = processing these records and outputting the corresponding html
Current flow:
Receive client request
Check for resource in cache
If not in cache or cache expired, regenerate
Return result
The problem is that the regeneration step can tie up a single server process for 10-15 seconds. If a couple of users request the same resource, that could result in a couple of processes regenerating the exact same resource simultaneously, each taking up 10-15 seconds.
Wouldn't it be preferrable to have the frontend signal some background process saying "Hey, regenerate this resource for me".
But then what would it display to the user? "Rebuilding" is not acceptable. All resources would have to be in cache ahead of time. This could be a problem as the database would almost be duplicated on the filesystem (too big to fit in memory). Is there a way to avoid this? Not ideal, but it seems like the only way out.
But then there's one more problem. How to keep the same two processes from requesting the regeneration of a resource at the same time? The background process could be regenerating the resource when a frontend asks for the regeneration of the same resource.
I'm using PHP and the Zend Framework just in case someone wants to offer a platform-specific solution. Not that it matters though - I think this problem applies to any language/framework.
Thanks!