Previously, I use to work for a small internet company that sells dental plans online. Our primary focus concerning disaster prevention and recovery is on our corporate website and private intranet site. We had a multiphase disaster recovery plan that includes data redundancy, load balancing, and off-site monitoring.
Data redundancy is a key aspect of our disaster recovery plan. The first phase of this is to replicate our data to multiple database servers and schedule daily backups of the databases that are stored off site. The next phase is the file replication of data amongst our web servers that are also backed up daily by our collocation. In addition to the files located on the server, files are also stored locally on development machines, and again backed up using version control software.
Load balancing is another key aspect of our disaster recovery plan. Load balancing offers many benefits for our system, better performance, load distribution and increased availability. With our servers behind a load balancer our system has the ability to accept multiple requests simultaneously because the load is split between multiple servers. Plus if one server is slow or experiencing a failure the traffic is diverted amongst the other servers connected to the load balancer allowing the server to get back online.
The final key to our disaster recovery plan is off-site monitoring that notifies all IT staff of any outages or errors on the main website encountered by the monitor. Messages are sent by email, voicemail, and SMS.
According to Disasterrecovery.org, disaster recovery planning is the way companies successfully manage crises with minimal cost and effort and maximum speed compared to others that are forced to make decision out of desperation when disasters occur.
In addition Sun Guard stated in 2009 that the first step in disaster recovery planning is to analyze company risks and factor in fixed costs for things like hardware, software, staffing and utilities, as well as indirect costs, such as floor space, power protection, physical and information security, and management. Also availability requirements need to be determined per application and system as well as the strategies for recovery.