This is Part 4 of a multi-part blog article where we are discussing various aspects of setting up Data Guard synchronous redo transport (SYNC). In Part 1 of this article, I debunked the myth that Data Guard SYNC is similar to a two-phase commit operation. In Part 2, I discussed the various ways that network latency may or may not impact a Data Guard SYNC configuration. In Part 3, I talked in details regarding why Data Guard SYNC is a good thing, and the distance implications you have to keep in mind.
In this final article of the series, I will talk about how you can nicely complement Data Guard SYNC with the ability to failover in seconds.
Wait - Did I Say “Seconds”?
Did I just say that some customers do Data Guard failover in seconds? Yes, Virginia, there is a Santa Claus.
Data Guard has an automatic failover capability, aptly called Fast-Start Failover. Initially available with Oracle Database 10g Release 2 for Data Guard SYNC transport mode (and enhanced in Oracle Database 11g to support Data Guard ASYNC transport mode), this capability, managed by Data Guard Broker, lets your Data Guard configuration automatically failover to a designated standby database. Yes, this means no human intervention is required to do the failover. This process is controlled by a low footprint Data Guard Broker client called Observer, which makes sure that the primary database and the designated standby database are behaving like good kids. If something bad were to happen to the primary database, the Observer, after a configurable threshold period, tells that standby, “Your time has come, you are the chosen one!” The standby dutifully follows the Observer directives by assuming the role of the new primary database. The DBA or the Sys Admin doesn’t need to be involved.
And - in case you are following this discussion very closely, and are wondering … “Hmmm … what if the old primary is not really dead, but just network isolated from the Observer or the standby - won’t this lead to a split-brain situation?” The answer is No - It Doesn’t. With respect to why-it-doesn’t, I am sure there are some smart DBAs in the audience who can explain the technical reasons. Otherwise - that will be the material for a future blog post.
So - this combination of SYNC and Fast-Start Failover is the nirvana of lights-out, integrated HA and DR, as practiced by some of our advanced customers. They have observed failover times (with no data loss) ranging from single-digit seconds to tens of seconds. With this, they support operations in industry verticals such as manufacturing, retail, telecom, Internet, etc. that have the most demanding availability requirements.
One of our leading customers with massive cloud deployment initiatives tells us that they know about server failures only after Data Guard has automatically completed the failover process and the app is back up and running! Needless to mention, Data Guard Broker has the integration hooks for interfaces such as JDBC and OCI, or even for custom apps, to ensure the application gets automatically rerouted to the new primary database after the database level failover completes.
Net Net?
To sum up this multi-part blog article, Data Guard with SYNC redo transport mode, plus Fast-Start Failover, gives you the ideal triple-combo - that is, it gives you the assurance that for critical outages, you can failover your Oracle databases:
very fast
without human intervention, and
without losing any data.
In short, it takes the element of risk out of critical IT operations. It does require you to be more careful with your network and systems planning, but as far as HA is concerned, the benefits outweigh the investment costs.
So, this is what we in the MAA Development Team believe in. What do you think? How has your deployment experience been? We look forward to hearing from you!