Monitoring / metric collection for system collectives that change a lot in time (a.k.a. cloud)
- by Florin Andrei
When your server fleet doesn't change a lot in time, like when you're using bare-metal hosting, classic monitoring and metric collection solutions (Nagios, Munin) work well.
But if the number of systems varies a lot in time, and may in fact vary rapidly, classic software is more difficult to setup and use. E.g., trying to make Nagios (monitoring) keep up with a rapidly evolving cloud infrastructure can be cumbersome. Same for Munin (metric collection). It's not just the configuration, but the way the information is conveyed to the user, or displayed, is inadequate for the cloud.
What are some possible alternatives that work well with the cloud? The goals are to collect and display metrics (analog to Munin), and generate alerts when certain metrics go out of bounds or when certain services are unavailable (analog to Nagios), and do everything in a cloud-friendly manner.
Some cloud providers offer monitoring / metric collection as services, but not always, and if you use more than one provider you don't want to become too dependent of just one vendor. So provider-independent solutions are required.
EDIT: I am asking this question in a general fashion - not limited to any given cloud infrastructure (like OpenStack), but in the general case of using arbitrary cloud providers.