Clusterware 11gR2 – Setting up an Active/Passive failover configuration
Posted
by Gilles Haro
on Oracle Blogs
See other posts from Oracle Blogs
or by Gilles Haro
Published on Mon, 10 Jan 2011 07:04:00 +0000
Indexed on
2011/01/10
15:57 UTC
Read the original article
Hit count: 383
Oracle provides many interesting ways to ensure High Availability. Dataguard configurations, RAC configurations or even both (as recommended for a Maximum Available Architecture - MAA) are the most frequently found.
But when it comes to protecting a system with an Active/Passive architecture with failover capabilities, one often thinks to expensive third party cluster systems.
Oracle Clusterware technology, which comes free with Oracle Database, is – in the knowing of most people - often linked to Oracle RAC and therefore, is rarely used to implement failover solutions.
11gR2 Clusterware – which is part of Oracle Grid Infrastructure - provides a comprehensive framework to setup automatic failover configurations.
It is actually possible to make “failover-able'” and, therefore to protect, almost every kind of application (from xclock to the more complex Application Server)
In the next couple of lines, I will try to present the different steps to achieve this goal : Have a fully operational 11gR2 database protected by automatic failover capabilities.
I assume you are fluent in installing Oracle Database 11gR2, Oracle Grid Infrastructure 11gR2 on a Linux system and that ASM is not a problem for you (as I am using it as a shared storage).
If not, please have a look at Oracle Documentation.
As often, I made my tests using an Oracle VirtualBox environment. The scripts are tested and functional. Unfortunately, there can always be a typo or a mistake.
This blog entry is not a course around the Clusterware Framework.
I just hope it will let you see how powerful it is and that it will give you the whilst to go further with it…
Prerequisite
- 2 Linux boxes (OELCluster01 and OELCluster02) at the same OS level. I used OEL 5 Update 5 with Enterprise Kernel.
- Shared Storage (SAN). On my VirtualBox system, I used Openfiler to simulate the SAN
- Oracle 11gR2 Database (11.2.0.1)
- Oracle 11gR2 Grid Infrastructure (11.2.0.1)
Step 1 – Install the software
- Using asmlib, create 3 ASM disks (ASM_CRS, ASM_DTA and ASM_FRA)
- Install Grid Infrastructure for a cluster (OELCluster01 and OELCluster02 are the 2 nodes of the cluster)
- Use ASM_CRS to store Voting Disk and OCR.
- Use SCAN.
- Install Oracle Database Standalone binaries on both nodes.
- Use asmca to check/mount the disk groups on 2 nodes
- Use dbca to create and configure a database on the primary node
- Let’s name it DB11G.
- Copy the pfile, password file to the second node.
- Create adump directoty on the second node.
Step 2 - Setup the resource to be protected
After its creation with dbca, the database is automatically protected by the Oracle Restart technology available with Grid Infrastructure. Consequently, it restarts automatically (if possible) after a crash (ex: kill –9 smon).
A database resource has been created for that in the Cluster Registry.
We can observe this with the command : crsctl status resource that shows and ora.dba11g.db entry.
Let’s save the definition of this resource, for future use :
- mkdir –p /crs/11.2.0/HA_scripts
- chown oracle:oinstall /crs/11.2.0/HA_scripts
- crsctl status resource ora.db11g.db -p > /crs/11.2.0/HA_scripts/myResource.txt
Although very interesting, Oracle Restart is not cluster aware and cannot restart the database on any other node of the cluster. So, let’s remove it from the OCR definitions, we don’t need it !
- srvctl stop database -d DB11G
- srvctl remove database -d DB11G
Instead of it, we need to create a new resource of a more general type : cluster_resource. Here are the steps to achieve this :
- Create an action script : /crs/11.2.0/HA_scripts/my_ActivePassive_Cluster.sh
#!/bin/bash
export ORACLE_HOME=/oracle/product/11.2.0/dbhome_1
export ORACLE_SID=DB11Gcase $1 in
'start')
$ORACLE_HOME/bin/sqlplus /nolog <<EOF
connect / as sysdba
startup
EOF
RET=0
;;
'stop')
$ORACLE_HOME/bin/sqlplus /nolog <<EOF
connect / as sysdba
shutdown immediate
EOF
RET=0
;;
'check')
ok=`ps -ef | grep smon | grep $ORACLE_SID | wc -l`
if [ $ok = 0 ]; then
RET=1
else
RET=0
fi
;;
'*')
RET=0
;;
esacif [ $RET -eq 0 ]; then
exit 0
else
exit 1
fi
This script must provide, at least, methods to start, stop and check the database. It is self-explaining and contains nothing special. Just be aware that it is run as Oracle user (because of the ACL property – see later) and needs to know about the environment. It also needs to be present on every node of the cluster.
- chmod +x /crs/11.2.0/HA_scripts/my_ActivePassive_Cluster.sh
- scp /crs/11.2.0/HA_scripts/my_ActivePassive_Cluster.sh oracle@OELCluster02:/crs/11.2.0/HA_scripts
Create a new resource file, based on the information we got from previous myResource.txt .
Name it myNewResource.txt.
myResource.txt is shown below.
As we can see, it defines an ora.database.type resource, named ora.db11g.db.
A lot of properties are related to this type of resource and do not need to be used for a cluster_resource.
NAME=ora.db11g.db
TYPE=ora.database.type
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=1
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
AUTO_START=restore
CARDINALITY=1
CHECK_INTERVAL=1
CHECK_TIMEOUT=600
CLUSTER_DATABASE=false
DB_UNIQUE_NAME=DB11G
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=database) PROPERTY(DB_UNIQUE_NAME= CONCAT(PARSE(%NAME%, ., 2), %USR_ORA_DOMAIN%, .)) ELEMENT(INSTANCE_NAME= %GEN_USR_ORA_INST_NAME%)
DEGREE=1
DESCRIPTION=Oracle Database resource
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=60
FAILURE_THRESHOLD=1
GEN_AUDIT_FILE_DEST=/oracle/admin/DB11G/adump
GEN_USR_ORA_INST_NAME=
GEN_USR_ORA_INST_NAME@SERVERNAME(oelcluster01)=DB11G
HOSTING_MEMBERS=
INSTANCE_FAILOVER=0
LOAD=1
LOGGING_LEVEL=1
MANAGEMENT_POLICY=AUTOMATIC
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORACLE_HOME=/oracle/product/11.2.0/dbhome_1
PLACEMENT=restricted
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=2
ROLE=PRIMARY
SCRIPT_TIMEOUT=60
SERVER_POOLS=ora.DB11G
SPFILE=+DTA/DB11G/spfileDB11G.ora
START_DEPENDENCIES=hard(ora.DTA.dg,ora.FRA.dg) weak(type:ora.listener.type,uniform:ora.ons,uniform:ora.eons) pullup(ora.DTA.dg,ora.FRA.dg)
START_TIMEOUT=600
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.DTA.dg,shutdown:ora.FRA.dg)
STOP_TIMEOUT=600
UPTIME_THRESHOLD=1h
USR_ORA_DB_NAME=DB11G
USR_ORA_DOMAIN=haroland
USR_ORA_ENV=
USR_ORA_FLAGS=
USR_ORA_INST_NAME=DB11G
USR_ORA_OPEN_MODE=open
USR_ORA_OPI=false
USR_ORA_STOP_MODE=immediate
VERSION=11.2.0.1.0
I removed database type related entries from myResource.txt and modified some other to produce the following myNewResource.txt.
- Notice the NAME property that should not have the ora. prefix
- Notice the TYPE property that is not ora.database.type but cluster_resource.
- Notice the definition of ACTION_SCRIPT.
- Notice the HOSTING_MEMBERS that enumerates the members of the cluster (as returned by the olsnodes command).
NAME=DB11G.db
TYPE=cluster_resource
DESCRIPTION=Oracle Database resource
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
ACTION_SCRIPT=/crs/11.2.0/HA_scripts/my_ActivePassive_Cluster.sh
PLACEMENT=restricted
ACTIVE_PLACEMENT=0
AUTO_START=restore
CARDINALITY=1
CHECK_INTERVAL=10
DEGREE=1
ENABLED=1
HOSTING_MEMBERS=oelcluster01 oelcluster02
LOGGING_LEVEL=1
RESTART_ATTEMPTS=1
START_DEPENDENCIES=hard(ora.DTA.dg,ora.FRA.dg) weak(type:ora.listener.type,uniform:ora.ons,uniform:ora.eons) pullup(ora.DTA.dg,ora.FRA.dg)
START_TIMEOUT=600
STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.DTA.dg,shutdown:ora.FRA.dg)
STOP_TIMEOUT=600
UPTIME_THRESHOLD=1h
Register the resource.
Take care of the resource type. It needs to be a cluster_resource and not a ora.database.type resource (Oracle recommendation) .
- crsctl add resource DB11G.db -type cluster_resource -file /crs/11.2.0/HA_scripts/myNewResource.txt
Step 3 - Start the resource
- crsctl start resource DB11G.db
This command launches the ACTION_SCRIPT with a start and a check parameter on the primary node of the cluster.
Step 4 - Test this
We will test the setup using 2 methods.
- crsctl relocate resource DB11G.db
This command calls the ACTION_SCRIPT (on the two nodes) to stop the database on the active node and start it on the other node.
Once done, we can revert back to the original node, but, this time we can use a more “MS$ like” method :Turn off the server on which the database is running.
After short delay, you should observe that the database is relocated on node 1.
Conclusion
Once the software installed and the standalone database created (which is a rather common and usual task), the steps to reach the objective are quite easy :
- Create an executable action script on every node of the cluster.
- Create a resource file.
- Create/Register the resource with OCRusing the resource file.
- Start the resource.
This solution is a very interesting alternative to licensable third party solutions.
References
Clusterware 11gR2 documentation
Oracle Clusterware Resource Reference
Gilles Haro
Technical Expert - Core Technology, Oracle Consulting
© Oracle Blogs or respective owner