Should we design programs to randomly kill themselves?
- by jimbojw
In a nutshell, should we design death into our programs, processes, and threads at a low level, for the good of the overall system?
Failures happen. Processes die. We plan for disaster and occasionally recover from it. But we rarely design and implement unpredictable program death. We hope that our services' uptimes are as long as we care to keep them running.
A macro-example of this concept is Netflix's Chaos Monkey, which randomly terminates AWS instances in some scenarios. They claim that this has helped them discover problems and build more redundant systems.
What I'm talking about is lower level. The idea is for traditionally long-running processes to randomly exit. This should force redundancy into the design and ultimately produce more resilient systems.
Does this concept already have a name? Is it already being used in the industry?