How to calculate real-time stats?
Posted
by
Diego Jancic
on Stack Overflow
See other posts from Stack Overflow
or by Diego Jancic
Published on 2012-06-01T20:30:18Z
Indexed on
2012/06/02
4:40 UTC
Read the original article
Hit count: 303
I have a site with millions of users (well, actually it doesn't have any yet, but let's imagine), and I want to calculate some stats like "log-ins in the past hour".
The problem is similar to the one described here: http://highscalability.com/blog/2008/4/19/how-to-build-a-real-time-analytics-system.html
The simplest approach would be to do a select like this:
select count(distinct user_id)
from logs
where date>='20120601 1200' and date <='20120601 1300'
(of course other conditions could apply for the stats, like log-ins per country) Of course this would be really slow, mainly if it has millions (or even thousands) of rows, and I want to query this every time a page is displayed.
How would you summarize the data? What should go to the (mem)cache?
EDIT: I'm looking for a way to de-normalize the data, or to keep the cache up-to-date. For example I could increment an in-memory variable every time someone logs in, but that would help to know the total amount of logins, not the "logins in the last hour". Hope it's more clear now.
© Stack Overflow or respective owner