Hi everybody,
We have a web server that we're about to launch a number of applications onto. They will all share database and memcached servers, but each application has it's own mySQL database and all memcached keys per application, is prefixed.
Possible scenario:
If a memcached server in our cluster goes boom, we want someone (operative system admin) to be automatically contacted by email/iphone push notification or in any other appropriate way.
If we we're about to install 150 identical applications for our customers on our servers, and a memcached server dies - all 150 applications will individually find this out and contact our system admin, which most certainly is going to think about getting a new job where he or she isn't about to be woken up by getting 150 messages sent 4:15 in the morning.
Possible solution:
One idea is to set up an external server for error handling that gets a $_POST or cURL request sent, and handles storage of the error message depending on the seriousness of the actual error message. It would of course check upon receiving the error call, that if the same memcached server have already been reported as offline, there would be no need to spam the system admin with additional reminders...
The questions:
What's a good approach on how to handle errors?
How does the big guys in the industry handle this?
Thanks!