Chessin's principles of RAS design

Posted by user12608173 on Oracle Blogs See other posts from Oracle Blogs or by user12608173
Published on Fri, 12 Oct 2012 23:39:22 +0000 Indexed on 2012/10/13 3:43 UTC
Read the original article Hit count: 195

Filed under:
In late 2001 I developed an internal talk on designing hardware for easier error injection, prevention, diagnosis, and correction. (This talk became the basis for my paper on injecting errors for fun and profit.)

In that talk (but not in the paper), I articulated 10 principles of RAS design, which I list for you here:

  1. Protect everything
  2. Correct where you can
  3. Detect where you can't
  4. Where protection not feasible (e.g., ALUs), duplicate and compare
  5. Report everything; never throw away RAS information
  6. Allow non-destructive inspection (logging/scrubbing)
  7. Allow non-destructive alteration (injection) (that is, only change the bits you want changed, and leave everything else as is)
  8. Allow observation of all the bits as they are (logging)
  9. Allow alteration of any particular bit or combination of bits (injection)
  10. Document everything
Of course, it isn't always feasible to follow these rules completely all the time, but I put them out there as a starting point.

© Oracle Blogs or respective owner

Related posts about /Sun