We have 30+ apache httpd servers, and are looking to perform analysis on the logs both for historical trending and near "real time" monitoring/alerting. I'm mainly interested in things like error rates (4xx/5xx), response time, overall request rate, etc. but it would also be very useful to pull out more compute-intensive statistics like unique client IPs and user agents per unit of time.
I'm leaning towards building this as a centralized collector/server/storage, and am also considering the possibility of storing non-apache logs (i.e. general syslog, firewall logs, etc.) in the same system.
Obviously a large part of this will probably have to be custom (at least the connection between pieces and the parsing/analysis we do), but I haven't been able to find much information on people who have done stuff like this, at least at shops smaller than Google/Facebook/etc. who can throw their log data into a hundred-node compute cluster and run Map/Reduce on it.
The main things I'm looking for are:
- All open source
- Some way of collecting logs from apache machines that isn't too resource-intensive, and transports them relatively quickly over the network
- Some way of storing them (NoSQL? key-value store?) on the backend, for a given amount of time (and then rolling them up into historical averages)
- In the middle of this, a way of graphing in near-real-time (probably also with some statistical analysis on it) and hopefully alerting off of those graphs.
Any suggestions/pointers/ideas, to either "products"/projects or descriptions of how other people do this would be greatly helpful. Unfortunately, we're not exactly a new-age-y devops shop, lots of old stuff, homogeneous infrastructure, and strained boxes.