Data architecture for event log metrics?
- by elliot42
My service has a large ongoing number of user events, and we would like to do things like "count occurrence of event type T since date D."
We are trying to make two basic decisions:
What to store? Storing every event vs. only storing aggregates
(Event log style) log every event and count them later, vs.
(Time-series style) store a single aggregated "count of event E for date D" for every day
Where to store the data
In a relational database (particularly MySQL)
In a non-relational (NoSQL) database
In flat log files (collected centrally over the network via syslog-ng)
What is standard practice / where can I read more about comparing the different types of systems?
Additional details:
The total event stream is large, potentially hundreds of thousands of entries per day
But our current need is only to count certain types of events within it
We don't necessarily need real-time access to the raw data or aggregation results
IMHO, "log all events to files, crawl them at a later time to filter and aggregate the stream" is a pretty standard UNIX Way, but my Rails-y compatriots seem to think that nothing is real unless it's in MySQL.