Cluster Node Recovery Using Second Node in Solaris Cluster
Posted
by Onur Bingul
on Oracle Blogs
See other posts from Oracle Blogs
or by Onur Bingul
Published on Wed, 31 Oct 2012 11:25:07 +0000
Indexed on
2012/10/31
23:10 UTC
Read the original article
Hit count: 400
/Solaris Cluster
Node 0a is the cluster node that has crashed and could not boot anymore.
Node 0b is the node in cluster and in production with services active.
Both nodes have their boot disk mirrored via SDS/SVM.
We have many options to clone the boot disk from node 0b:
- make a copy via network using the ufsdump command and pipe to ufsrestore
- make a copy inserting the disk locally on node 0b and creating the third mirror with SDS
- make a copy inserting the disk locally on node 0b using dd command
In this procedure we are going to use dd command (from my experience this is the best option).
Bare in mind that in the examples provided we work on Sun Fire V240 systems which have SCSI internal disks. In the case of Fibre Channel (FC) internal disks you must pay attention to the unique identifier, or World Wide Name (WWN), associated with each FC disk (in this case take a look at infodoc #40133 in order to recreate the device tree correctly).
Procedure:
On node 0b the boot disk is c1t0d0 (c1t1d0 mirror) and this is the VTOC:
* Partition Tag Flags Sector Count Sector Mount Directory
0 2 00 0 2106432 2106431
1 3 01 2106432 74630784 76737215
2 5 00 0 143349312 143349311
4 7 00 76737216 50340672 127077887
5 4 00 127077888 14683968 141761855
6 0 00 141761856 1058304 142820159
7 0 00 142820160 529152 143349311
We will insert the new disk on node 0b and it will be seen as c1t2d0.
1) On node 0b we make a copy via dd from disk c1t0d0s2 to disk c1t2d0s2
# dd if=/dev/rdsk/c1t0d0s2 of=/dev/rdsk/c1t2d0s2 bs=8192k
A copy of a 72GB disk will take approximately about 45 minutes.
Note: as an alternative to make identical copy of root over network follow Document ID: 47498
Title: Sun[TM] Cluster 3.0: How to Rebuild a node with Veritas Volume Manager
2) Perform an fsck on disk c1t2d0 data slices:
1. fsck -o f /dev/rdsk/c1t2d0s0 (root)
2. fsck -o f /dev/rdsk/c1t2d0s4 (/var)
3. fsck -o f /dev/rdsk/c1t2d0s5 (/usr)
4. fsck -o f /dev/rdsk/c1t2d0s6 (/globaldevices)
3) Mount the root file system in order to edit following files for changing the node name:
# mount /dev/dsk/c1t2d0s0 /mnt
Change the hostname from 0b to 0a:
# cd /mnt/etc
# vi hosts
# vi hostname.bge0
# vi hostname.bge2
# vi nodename
4) Change the /mnt/etc/vfstab from the actual:
/dev/md/dsk/d201 - - swap - no -
/dev/md/dsk/d200 /dev/md/rdsk/d200 / ufs 1 no -
/dev/md/dsk/d205 /dev/md/rdsk/d205 /usr ufs 1 no logging
/dev/md/dsk/d204 /dev/md/rdsk/d204 /var ufs 1 no logging
#/dev/md/dsk/d206 /dev/md/rdsk/d206 /globaldevices ufs 2 yes logging
swap - /tmp tmpfs - yes -
/dev/md/dsk/d206 /dev/md/rdsk/d206 /global/.devices/node@2 ufs 2 no
global
to this (unencapsulate disk from SDS/SVM):
/dev/dsk/c1t0d0s1 - - swap - no -
/dev/dsk/c1t0d0s0 /dev/rdsk/c1t0d0s0 / ufs 1 no -
/dev/dsk/c1t0d0s5 /dev/rdsk/c1t0d0s5 /usr ufs 1 no logging
/dev/dsk/c1t0d0s4 /dev/rdsk/c1t0d0s4 /var ufs 1 no logging
#/dev/md/dsk/d206 /dev/md/rdsk/d206 /globaldevices ufs 2 yes logging
swap - /tmp tmpfs - yes -
/dev/dsk/c1t0d0s6 /dev/rdsk/c1t0d0s6 /global/.devices/node@1 ufs 2 no
global
It is important that global device partition (slice 6) in the new vfstab will point to the physical partition of the disk (in our case slice 6).
Be careful with the name you use for the new disk. In this case we define it as c1t0d0 because we will insert it as target 0 in node 0a.
But this could be different based on the configuration you are working on.
5) Remove following entry from /mnt/etc/system (part of unencapsulation procedure):
rootdev:/pseudo/md@0:0,200,blk
6) Correct the link shared -> ../../global/.devices/node@2/dev/md/shared in order to point to the nodeid of node 0a (in our case nodeid 1):
# cd /mnt/dev/md
how it is now.... node 0b has nodeid 2
lrwxrwxrwx 1 root root 42 Mar 10 2005 shared ->
../../global/.devices/node@2/dev/md/shared
# rm shared
# ln -s ../../global/.devices/node@1/dev/md/shared shared
how is going to be... with nodeid 1 for node 0a
lrwxrwxrwx 1 root root 42 Mar 10 2005 shared ->
../../global/.devices/node@1/dev/md/shared
7) Change nodeid (in our case from 2 to 1):
# cd /mnt/etc/cluster
# vi nodeid
8) Change the file /mnt/etc/path_to_inst in order to reflect the correct nodeid for node 0a:
# cd /mnt/etc
# vi path_to_inst
Change entries from node@2 to node@1 with the vi command ":%s/node@2/node@1/g"
9) Write the bootblock to the disk... just in case:
# /usr/sbin/installboot /usr/platform/sun4u/lib/fs/ufs/bootblk /dev/rdsk/c1t2d0s0
Now the disk is ready to be inserted in node 0a in order to bootup the node.
10) Bootup node 0a with command "boot -sx"... this is becasue we need to make some changes in ccr files in order to recreate did environment.
11) Modify cluster ccr:
# cd /etc/cluster/ccr
# rm did_instances
# rm did_instances.bak
# vi directory - remove the did_instances line.
# /usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/directory
# grep ccr_gennum /etc/cluster/ccr/directory ccr_gennum -1
# /usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/infrastructure
# grep ccr_gennum /etc/cluster/ccr/infrastructure ccr_gennum -1
12) Bring the node 0a down again to the ok prompt and then issue the command "boot -r"
Now the node will join the cluster and from scstat and metaset command you can verify functionality. Next step is to encapsulate the boot disk in SDS/SVM and create the mirrors.
In our case node 0b has metadevice name starting from d200. For this reason on node 0a we need to create metadevice starting from d100. This is just an example, you can have different names.
The important thing to remember is that metadevice boot disks have different names on each node.
13) Remove metadevice pointing to the boot and mirror disks (inherit from node 0b):
# metaclear -r -f d200
# metaclear -r -f d201
# metaclear -r -f d204
# metaclear -r -f d205
# metaclear -r -f d206
verify from metastat that no metadevices are set for boot and mirror disks.
14) Encapsulate the boot disk:
# metainit -f d110 1 1 c1t0d0s0
# metainit d100 -m d110
# metaroot d100
15) Reboot node 0a.
16) Create all the metadevice for slices remaining on boot disk
# metainit -f d111 1 1 c1t0d0s1
# metainit d101 -m d111
# metainit -f d114 1 1 c1t0d0s4
# metainit d104 -m d114
# metainit -f d115 1 1 c1t0d0s5
# metainit d105 -m d115
# metainit -f d116 1 1 c1t0d0s6
# metainit d106 -m d116
17) Edit the vfstab in order to specifiy metadevices created:
old:
/dev/dsk/c1t0d0s1 - - swap - no -
/dev/md/dsk/d100 /dev/md/rdsk/d100 / ufs 1 no -
/dev/dsk/c1t0d0s5 /dev/rdsk/c1t0d0s5 /usr ufs 1 no logging
/dev/dsk/c1t0d0s4 /dev/rdsk/c1t0d0s4 /var ufs 1 no logging
#/dev/md/dsk/d206 /dev/md/rdsk/d206 /globaldevices ufs 2 yes logging
swap - /tmp tmpfs - yes -
/dev/dsk/c1t0d0s6 /dev/rdsk/c1t0d0s6 /global/.devices/node@1 ufs 2 no
global
new:
/dev/md/dsk/d101 - - swap - no -
/dev/md/dsk/d100 /dev/md/rdsk/d100 / ufs 1 no -
/dev/md/dsk/d105 /dev/md/rdsk/d105 /usr ufs 1 no logging
/dev/md/dsk/d104 /dev/md/rdsk/d104 /var ufs 1 no logging
#/dev/md/dsk/106 /dev/md/rdsk/d106 /globaldevices ufs 2 yes logging
swap - /tmp tmpfs - yes -
/dev/md/dsk/d106 /dev/md/rdsk/d106 /global/.devices/node@1 ufs 2 no
global
18) Reboot node 0a in order to check new SDS/SVM boot configuration.
19) Label the mirror disk c1t1d0 with the VTOC of boot disk c1t0d0:
# prtvtoc /dev/dsk/c1t0d0s2 > /var/tmp/VTOC_c1t0d0
# fmthard -s /var/tmp/VTOC_c1t0d0 /dev/rdsk/c1t1d0s2
20) Put DB replica on slice 7 of disk c1t1d0:
# metadb -a -c 3 /dev/dsk/c1t1d0s7
21) Create metadevice for mirror disk c1t1d0 and attach the new mirror side:
# metainit d120 1 1 c1t1d0s0
# metattach d100 d120
# metainit d121 1 1 c1t1d0s1
# metattach d101 d121
# metainit d124 1 1 c1t1d0s4
# metattach d104 d124
# metainit d125 1 1 c1t1d0s5
# metattach d105 d125
# metainit d126 1 1 c1t1d0s6
# metattach d106 d126
© Oracle Blogs or respective owner