I'm involved in maintaining an Ubuntu VPS which runs our django websites (nginx/apache/mod_wsgi) and we've been having some memory spikes which have either caused the database to die, or induced kernel panic when the memory management system can't find any killable processes. I'm working on fixing the memory spikes, but I'm wondering whether there's anything I can do to better deal with the problem if it occurs again.
Are there any tools I could use to detect the memory spikes and then, say, kill the offending process and email the server admin to fix it up? Killing off one website so that the server can remain operational is certainly preferable to the whole thing falling over.
Also, we were charged $600 for after-hours service because we had to get the hosting company to restart the server - is this standard practice among hosting companies? Another provider I work with provides a panel with which I can stop and start the server myself, and given that a restart was all that was needed, $600 seems mightily excessive. (That's NZD, it's around $445 USD)