How can I best implement 'cache until further notice' with memcache in multiple tiers?
- by ajreal
the term "client" used here is not referring to client's browser, but client server
Before cache workflow
1. client make a HTTP request -->
2. server process -->
3. store parsed results into memcache for next use (cache indefinitely) -->
4. return results to client -->
5. client get the result, store into client's local memcache with TTL
After cache workflow
1. another client make a HTTP request -->
2. memcache found return memcache results to client -->
3. client get the result, store into client's local memcache with TTL
TTL = time to live
Is possible for me to know when the data was updated,
and to expire relevant memcache(s) accordingly.
However, the pitfalls on client site cache TTL
Any data update before the TTL is not pick-up by client memcache.
In reverse manner, where there is no update, client memcache still expire after the TTL
First request (or concurrent requests) after cache TTL will get throttle as it need to repeat the "Before cache workflow"
In the event where client require several HTTP requests on a single web page,
it could be very bad in performance.
Ideal solution should be client to cache indefinitely until further notice.
Here are the three proposals about futher notice
Proposal 1 : Make use on HTTP header (current implementation)
1. client sent HTTP request last modified time header
2. server check if last data modified time=last cache time return status 304
3. client based on header to decide further processing
GOOD?
----
- save some parsing for client
- lesser data transfer
BAD?
----
- fire a HTTP request is still slow
- server end still need to process lots of requests
Proposal 2 : Consistently issue a HTTP request to check all data group last modified time
1. client fire a HTTP request
2. server to return last modified time for all data group
3. client compare local last cache time with the result
4. if data group last cache time < server last modified time
then request again for that data group only
GOOD?
----
- only fetch what is no up-to-date
- less requests for server
BAD?
----
- every web page require a HTTP request
Proposal 3 : Tell client when new data is available (Push)
1. when server end notice there is a change on a data group
2. notify clients on the changes
3. help clients to fetch again data
4. then reset client local memcache after data is parsed
GOOD?
----
- let the cache act/behave like a true cache
BAD?
----
- encourage race condition
My preference is on proposal 3,
and something like Gearman could be ideal
Where there is a change, Gearman server to sent the task to multiple clients (workers).
Am I crazy?
(I know my first question is a bit crazy)