Clusterware 11gR2 – Setting up an Active/Passive failover configuration
Posted
by Gilles Haro
on Oracle Blogs
See other posts from Oracle Blogs
or by Gilles Haro
Published on Wed, 16 Feb 2011 10:20:00 +0000
Indexed on
2011/02/16
15:30 UTC
Read the original article
Hit count: 430
Oracle is providing a large range of interesting solutions to ensure High Availability of the database.
Dataguard, RAC or even both configurations (as recommended by Oracle for a Maximum Available Architecture - MAA) are the most frequently found and used solutions.
But, when it comes to protecting a system with an Active/Passive architecture with failover capabilities, people often thinks to other expensive third party cluster systems.
Oracle Clusterware technology, which comes along at no extra-cost with Oracle Database or Oracle Unbreakable Linux, is - in the knowing of most people - often linked to Oracle RAC and therefore, is seldom used to implement failover solutions.
Oracle Clusterware 11gR2 (a part of Oracle 11gR2 Grid Infrastructure) provides a comprehensive framework to setup automatic failover configurations.
It is actually possible to make "failover-able'", and then to protect, almost any kind of application (from the simple xclock to the most complex Application Server).
Quoting Oracle:
“Oracle Clusterware is a portable cluster software that allows clustering of single servers so that they cooperate as a single system. Oracle Clusterware also provides the required infrastructure for Oracle Real Application Clusters (RAC). In addition Oracle Clusterware enables the protection of any Oracle application or any other kind of application within a cluster.”
In the next couple of lines, I will try to present the different steps to achieve this goal : Have a fully operational 11gR2 database protected by automatic failover capabilities.
I assume you are fluent in installing Oracle Database 11gR2, Oracle Grid Infrastructure 11gR2 on a Linux system and that ASM is not a problem for you (as I am using it as a shared storage).
If not, please have a look at Oracle Documentation.
As often, I made my tests using an Oracle VirtualBox environment. The scripts are tested and functional on my system. Unfortunately, there can always be a typo or a mistake.
This blog entry does not replace a course around the Clusterware Framework.
I just hope it will let you see how powerful it is and that it will give you the whilst to go further with it...
Note : This entry has been revised (rev.2) following comments from Philip Newlan.
Prerequisite
- 2 Linux boxes (OELCluster01 and OELCluster02) at the same OS level. I used OEL 5 Update 5 with an Enterprise Kernel.
- Shared Storage (SAN). On my VirtualBox system, I used Openfiler to simulate the SAN
- Oracle 11gR2 Database (11.2.0.1)
- Oracle 11gR2 Grid Infrastructure (11.2.0.1)
Step 1 - Install the software
- Using asmlib, create 3 ASM disks (ASM_CRS, ASM_DTA and ASM_FRA)
- Install Grid Infrastructure for a cluster (OELCluster01 and OELCluster02 are the 2 nodes of the cluster)
- Use ASM_CRS to store Voting Disk and OCR.
- Use SCAN.
- Install Oracle Database Standalone binaries on both nodes.
- Use asmca to check/mount the disk groups on 2 nodes
- Use dbca to create and configure a database on the primary node
- Let's name it DB11G.
- Copy the pfile, password file to the second node.
- Create adump directoty on the second node.
Step 2 - Setup the resource to be protected
After its creation with dbca, the database is automatically protected by the Oracle Restart technology available with Grid Infrastructure. Consequently, it restarts automatically (if possible) after a crash (ex: kill -9 smon).
A database resource has been created for that in the Cluster Registry.
We can observe this with the command : crsctl status resource that shows and ora.dba11g.db entry.
Let's save the definition of this resource, for future use :
- mkdir -p /crs/11.2.0/HA_scripts
- chown oracle:oinstall /crs/11.2.0/HA_scripts
- crsctl status resource ora.db11g.db -p > /crs/11.2.0/HA_scripts/myResource.txt
Although very interesting, Oracle Restart is not cluster aware and cannot restart the database on any other node of the cluster. So, let's remove it from the OCR definitions, we don't need it !
- srvctl stop database -d DB11G
- srvctl remove database -d DB11G
Instead of it, we need to create a new resource of a more general type : cluster_resource. Here are the steps to achieve this :
- Create an action script : /crs/11.2.0/HA_scripts/my_ActivePassive_Cluster.sh
#!/bin/bash
export ORACLE_HOME=/oracle/product/11.2.0/dbhome_1
export ORACLE_SID=DB11Gcase $1 in
'start')
$ORACLE_HOME/bin/sqlplus /nolog <<EOF
connect / as sysdba
startup
EOF
RET=0
;;
'stop')
$ORACLE_HOME/bin/sqlplus /nolog <<EOF
connect / as sysdba
shutdown immediate
EOF
RET=0
;;'clean')
$ORACLE_HOME/bin/sqlplus /nolog <<EOF
connect / as sysdba
shutdown abort
##for i in `ps -ef | grep -i $ORACLE_SID | awk '{print $2}' ` ;do kill -9 $i; done
EOF
RET=0
;;'check')
ok=`ps -ef | grep smon | grep $ORACLE_SID | wc -l`
if [ $ok = 0 ]; then
RET=1
else
RET=0
fi
;;
'*')
RET=0
;;
esacif [ $RET -eq 0 ]; then
exit 0
else
exit 1
fi
This script must provide, at least, methods to start, stop, clean and check the database. It is self-explaining and contains nothing special. Just be aware that it must be runnable (+x), it runs as Oracle user (because of the ACL property - see later) and needs to know about the environment. Also make sure it exists on every node of the cluster. Moreover, as of 11.2, the clean method is mandatory. It must provide the “last gasp clean up”, for example, a shutdown abort or a kill –9 of all the remaining processes.
- chmod +x /crs/11.2.0/HA_scripts/my_ActivePassive_Cluster.sh
- scp /crs/11.2.0/HA_scripts/my_ActivePassive_Cluster.sh oracle@OELCluster02:/crs/11.2.0/HA_scripts
Create a new resource file, based on the information we got from previous myResource.txt .
Name it myNewResource.txt.
myResource.txt is shown below.
As we can see, it defines an ora.database.type resource, named ora.db11g.db.
A lot of properties are related to this type of resource and do not need to be used for a cluster_resource.
NAME=ora.db11g.db
TYPE=ora.database.type
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=1
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
AUTO_START=restore
CARDINALITY=1
CHECK_INTERVAL=1
CHECK_TIMEOUT=600
CLUSTER_DATABASE=false
DB_UNIQUE_NAME=DB11G
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=database) PROPERTY(DB_UNIQUE_NAME= CONCAT(PARSE(%NAME%, ., 2), %USR_ORA_DOMAIN%, .)) ELEMENT(INSTANCE_NAME= %GEN_USR_ORA_INST_NAME%)
DEGREE=1
DESCRIPTION=Oracle Database resource
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=60
FAILURE_THRESHOLD=1
GEN_AUDIT_FILE_DEST=/oracle/admin/DB11G/adump
GEN_USR_ORA_INST_NAME=
GEN_USR_ORA_INST_NAME@SERVERNAME(oelcluster01)=DB11G
HOSTING_MEMBERS=
INSTANCE_FAILOVER=0
LOAD=1
LOGGING_LEVEL=1
MANAGEMENT_POLICY=AUTOMATIC
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORACLE_HOME=/oracle/product/11.2.0/dbhome_1
PLACEMENT=restricted
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=2
ROLE=PRIMARY
SCRIPT_TIMEOUT=60
SERVER_POOLS=ora.DB11G
SPFILE=+DTA/DB11G/spfileDB11G.ora
START_DEPENDENCIES=hard(ora.DTA.dg,ora.FRA.dg) weak(type:ora.listener.type,uniform:ora.ons,uniform:ora.eons) pullup(ora.DTA.dg,ora.FRA.dg)
START_TIMEOUT=600
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.DTA.dg,shutdown:ora.FRA.dg)
STOP_TIMEOUT=600
UPTIME_THRESHOLD=1h
USR_ORA_DB_NAME=DB11G
USR_ORA_DOMAIN=haroland
USR_ORA_ENV=
USR_ORA_FLAGS=
USR_ORA_INST_NAME=DB11G
USR_ORA_OPEN_MODE=open
USR_ORA_OPI=false
USR_ORA_STOP_MODE=immediate
VERSION=11.2.0.1.0
I removed database type related entries from myResource.txt and modified some other to produce the following myNewResource.txt.
- Notice the NAME property that should not have the ora. prefix
- Notice the TYPE property that is not ora.database.type but cluster_resource.
- Notice the definition of ACTION_SCRIPT.
- Notice the HOSTING_MEMBERS that enumerates the members of the cluster (as returned by the olsnodes command).
NAME=DB11G.db
TYPE=cluster_resource
DESCRIPTION=Oracle Database resource
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
ACTION_SCRIPT=/crs/11.2.0/HA_scripts/my_ActivePassive_Cluster.sh
PLACEMENT=restricted
ACTIVE_PLACEMENT=0
AUTO_START=restore
CARDINALITY=1
CHECK_INTERVAL=10
DEGREE=1
ENABLED=1
HOSTING_MEMBERS=oelcluster01 oelcluster02
LOGGING_LEVEL=1
RESTART_ATTEMPTS=1
START_DEPENDENCIES=hard(ora.DTA.dg,ora.FRA.dg) weak(type:ora.listener.type,uniform:ora.ons,uniform:ora.eons) pullup(ora.DTA.dg,ora.FRA.dg)
START_TIMEOUT=600
STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.DTA.dg,shutdown:ora.FRA.dg)
STOP_TIMEOUT=600
UPTIME_THRESHOLD=1h
Register the resource.
Take care of the resource type. It needs to be a cluster_resource and not a ora.database.type resource (Oracle recommendation) .
- crsctl add resource DB11G.db -type cluster_resource -file /crs/11.2.0/HA_scripts/myNewResource.txt
Step 3 - Start the resource
- crsctl start resource DB11G.db
This command launches the ACTION_SCRIPT with a start and a check parameter on the primary node of the cluster.
Step 4 - Test this
We will test the setup using 2 methods.
- crsctl relocate resource DB11G.db
This command calls the ACTION_SCRIPT (on the two nodes) to stop the database on the active node and start it on the other node.
Once done, we can revert back to the original node, but, this time we can use a more "MS$ like" method :Turn off the server on which the database is running.
After short delay, you should observe that the database is relocated on node 1.
Conclusion
Once the software installed and the standalone database created (which is a rather common and usual task), the steps to reach the objective are quite easy :
- Create an executable action script on every node of the cluster.
- Create a resource file.
- Create/Register the resource with OCRusing the resource file.
- Start the resource.
This solution is a very interesting alternative to licensable third party solutions.
References
- Clusterware 11gR2 documentation
- Oracle Clusterware Resource Reference
- Clusterware for Unbreakable Linux
- Using Oracle Clusterware to Protect A Single Instance Oracle Database 11gR1 (to have an idea of complexity)
- Oracle Clusterware on OTN
Gilles Haro
Technical Expert - Core Technology, Oracle Consulting
© Oracle Blogs or respective owner