How best to handle end user notification in the event of system failure incl. email?

Posted by BrianLy on Server Fault See other posts from Server Fault or by BrianLy
Published on 2010-06-07T14:14:54Z Indexed on 2010/06/07 14:23 UTC
Read the original article Hit count: 210

I've been asked to research ways of handling end user notifications when systems such as email are experiencing problems. Perhaps an example will make this a little clearer.

We have a number of sites in different countries. Recently email was impacted at one of the sites, but it could have been a complete network outage. Information was provided by phone to local IT managers at the site but onward communication was slower than some would have liked.

It seems like almost everyone at the site has a personal mobile phone which could receive text messages, and perhaps access a remote website with postings on the situation. However managing and supporting a system to text people on these relatively infrequent occasions would be very costly to do internally.

What are other people doing to handle situations like this?

Some things I've thought of include:

  • Database of phone numbers to text. Seems costly and not very easy to maintain for an already stretched IT group. Is there an external service that would let you do this policies?
  • Send voicemail message to all phones on site.
  • Maintain an external website. This would not work in all situations (network failure), and there is a limit on the amount of info that can be posted externally. A site outage could be sensitive information in some situations. How could the site be password protected? Maybe OpenId/Facebook connect would work.
  • Use a site like Yammer.com which is publicly accessible but only by people with a company email address. Anyone using this for IT outage notifications?

To me it looks like there is no clear answer, and that there are solutions for some subsets of users. To be comprehensive a number of solutions would need to be combined.

Any additional thoughts or recommendations? What worked or didn't work for your organization?

© Server Fault or respective owner

Related posts about disaster-recovery

Related posts about notification