In solaris, how monitor & auto-respond to critical events
- by mamcx
I have a website that randomly fail. Is running in open solaris on joyent.
I have a monitoring service that alert me when the site is down, but, I want a way to put a "insider" tool that tell me why that happened.
Is because the cpu is too high? Not memory? Which process fail? Is possible to have a backtrace of that?
Everything is running on the Solaris Service Management Facility. The webserver is cherokee, the database is mysql and the language is python/django.
I want the most simple setup to monitor that & auto-respond , ie: restart the webserver or the django process in case of failure.
I prefer a low-overhead tool. I don't need the fancy monitoring that some tools have, no ned graphs or sms alert. Only know what fail, restart it if possible (maybe up to n times), and have a log somewhere when I will check it.