How can I best implement 'cache until further notice' with memcache in multiple tiers?

Posted by ajreal on Programmers See other posts from Programmers or by ajreal
Published on 2011-09-09T14:54:56Z Indexed on 2012/06/08 22:47 UTC
Read the original article Hit count: 250

the term "client" used here is not referring to client's browser, but client server

Before cache workflow

1. client make a HTTP request -->
2. server process -->
3. store parsed results into memcache for next use (cache indefinitely) -->
4. return results to client -->
5. client get the result, store into client's local memcache with TTL

After cache workflow

1. another client make a HTTP request -->
2. memcache found return memcache results to client -->
3. client get the result, store into client's local memcache with TTL

TTL = time to live

Is possible for me to know when the data was updated,
and to expire relevant memcache(s) accordingly.

However, the pitfalls on client site cache TTL

  1. Any data update before the TTL is not pick-up by client memcache.
  2. In reverse manner, where there is no update, client memcache still expire after the TTL
  3. First request (or concurrent requests) after cache TTL will get throttle as it need to repeat the "Before cache workflow"

In the event where client require several HTTP requests on a single web page,
it could be very bad in performance.

Ideal solution should be client to cache indefinitely until further notice.

Here are the three proposals about futher notice

Proposal 1 : Make use on HTTP header (current implementation)

1. client sent HTTP request last modified time header
2. server check if last data modified time=last cache time return status 304
3. client based on header to decide further processing

GOOD?
----
- save some parsing for client
- lesser data transfer

BAD?
----
- fire a HTTP request is still slow
- server end still need to process lots of requests

Proposal 2 : Consistently issue a HTTP request to check all data group last modified time

1. client fire a HTTP request
2. server to return last modified time for all data group
3. client compare local last cache time with the result
4. if data group last cache time < server last modified time 
   then request again for that data group only

GOOD?
----
- only fetch what is no up-to-date
- less requests for server

BAD?
----
- every web page require a HTTP request

Proposal 3 : Tell client when new data is available (Push)

 1. when server end notice there is a change on a data group
 2. notify clients on the changes
 3. help clients to fetch again data
 4. then reset client local memcache after data is parsed

GOOD?
----
- let the cache act/behave like a true cache

BAD?
----
- encourage race condition

My preference is on proposal 3,
and something like Gearman could be ideal
Where there is a change, Gearman server to sent the task to multiple clients (workers).

Am I crazy?
(I know my first question is a bit crazy)

© Programmers or respective owner

Related posts about optimization

Related posts about caching