Chessin's principles of RAS design
- by user12608173
In late 2001 I developed an internal talk on designing hardware for easier error injection, prevention, diagnosis, and correction. (This talk became the basis for my paper on injecting errors for fun and profit.)
In that talk (but not in the paper), I articulated 10 principles of RAS design, which I list for you here:
Protect everything
Correct where you can
Detect where you can't
Where protection not feasible (e.g., ALUs), duplicate and compare
Report everything; never throw away RAS information
Allow non-destructive inspection (logging/scrubbing)
Allow non-destructive alteration (injection) (that is, only change the bits you want changed, and leave everything else as is)
Allow observation of all the bits as they are (logging)
Allow alteration of any particular bit or combination of bits (injection)
Document everything
Of course, it isn't always feasible to follow these rules completely all the time, but I put them out there as a starting point.