Hi,
I'm looking at building a system for managing and reporting stats on web page performance. I'll be collecting a lot more stats than are available in the standard log formats (approx 20 metrics) but compared to most types of database applications, the base data structure will be very simple. My problem is that I'll be accumulating a lot of data - in the region of 100,000 records (i.e. sets of metrics) per hour.
Of course, resources are very limited!
So that its possible to sensibly interact with the data, I'd need to consolidate each metric into one minute bins, broken down by URL, then for anything more than 1 day old, consolidated into 10 minute bins, then at 1 week, hourly bins.
At the front end, I want to provide a view (prefereably as plots) of the last hour of data, with the facility for users to drill up/down through defined hierarchies of URLs (which do not always map directly to the hierarchy expressed in the path of the URL) and to view different time frames.
Rather than coding all this myself and using a relational database, I was wondering if there were tools available which would facilitate both the management of the data and the reporting.
I had a look at Mondrian however I can't see from the documentation I've looked at whether it's possible to drop the more granular information while maintaining the consolidated views of the data.
RRDTool looks promising in terms of managing the data consolidation, but seems to be rather limited in terms of querying the dataset as a multi-dimensional/relational database.
What else whould I be looking at?