What the best way to achieve RPO of zero and lowest possible RTO (less than 15 minutes) with SQL 2008 R2?
- by Adrian Hope-Bailie
We are running a payments (EFT transaction processing) application which is processing high volumes of transactions 24/7 and are currently investigating a better way of doing DB replication to our disaster recovery site.
Our current and previous strategies have included using both DoubleTake and Redgate to replicate data to a warm stand-by.
DoubleTake is the supported solution from the payments software vendor however their (DoubleTake's) support in South Africa is very poor. We had a few issues and simply couldn't ever resolve them so we had to give up on DoubleTake.
We have been using Redgate to manually read the data from the primary site (via queries) and write to the DR site but this is:
A bad solution
Getting the software vendor hot and bothered whenever we have support issues as it has a tendency to interfere with the payment application which is very DB intensive.
We recently upgraded the whole system to run on SQL 2008 R2 Enterprise which means we should probably be looking at using some of the built-in replication features.
The server has 2 fairly large databases with a mixture of tables containing highly volatile transactional data and pretty static configuration data.
Replication would be done over a WAN link to a separate physical site and needs to achieve the following objectives.
RPO: Zero loss - This is transactional data with financial impact so we can't lose anything.
RTO: Tending to zero - The business depends on our ability to process transactions every minute we are down we are losing money
I have looked at a few of the other questions/answers but none meet our case exactly:
SQL Server 2008 failover strategy - Log shipping or replication?
How to achieve the following RTO & RPO with logshipping only using SQL Server?
What is the best of two approaches to achieve DB Replication?
My current thinking is that we should use mirroring but I am concerned that for RPO:0 we will need to do delayed commits and this could impact the performance of the primary DB which is not an option.
Our current DR process is to:
Stop incoming traffic to the primary site and allow all in-flight transaction to complete.
Allow the replication to DR to complete.
Change network routing to route to DR site.
Start all applications and services on the secondary site (Ideally we can change this to a warmer stand-by whereby the applications are already running but not processing any transactions).
In other words the DR database needs to, as quickly as possible, catch up with primary and be ready for processing as the new primary. We would then need to be able to reverse this when we are ready to switch back.
Is there a better option than mirroring (should we be doing log-shipping too) and can anyone suggest other considerations that we should keep in mind?