What is bondib1 used for on SPARC SuperCluster with InfiniBand, Solaris 11 networking & Oracle RAC?

Posted by user12620111 on Oracle Blogs See other posts from Oracle Blogs or by user12620111
Published on Thu, 12 Apr 2012 17:02:46 -0500 Indexed on 2012/04/12 23:37 UTC
Read the original article Hit count: 278

Filed under:
A co-worker asked the following question about a SPARC SuperCluster InfiniBand network:

> on the database nodes the RAC nodes communicate over the cluster_interconnect. This is the
> 192.168.10.0 network on bondib0. (according to ./crs/install/crsconfig_params NETWORKS
> setting) 
> What is bondib1 used for? Is it a HA counterpart in case bondib0 dies?

This is my response:

Summary: bondib1 is currently only being used for outbound cluster interconnect interconnect traffic.

Details:

bondib0 is the cluster_interconnect

$ oifcfg getif           
bondeth0  10.129.184.0  global  public
bondib0  192.168.10.0  global  cluster_interconnect
ipmpapp0  192.168.30.0  global  public


bondib0 and bondib1 are on 192.168.10.1 and 192.168.10.2 respectively.

# ipadm show-addr | grep bondi
bondib0/v4static  static   ok           192.168.10.1/24
bondib1/v4static  static   ok           192.168.10.2/24


Hostnames tied to the IPs are node1-priv1 and node1-priv2 

# grep 192.168.10 /etc/hosts
192.168.10.1    node1-priv1.us.oracle.com   node1-priv1
192.168.10.2    node1-priv2.us.oracle.com   node1-priv2

For the 4 node RAC interconnect:

  • Each node has 2 private IP address on the 192.168.10.0 network.
  • Each IP address has an active InfiniBand link and a failover InfiniBand link.
  • Thus, the 4 node RAC interconnect is using a total of 8 IP addresses and 16 InfiniBand links.

bondib1 isn't being used for the Virtual IP (VIP):

$ srvctl config vip -n node1
VIP exists: /node1-ib-vip/192.168.30.25/192.168.30.0/255.255.255.0/ipmpapp0, hosting node node1
VIP exists: /node1-vip/10.55.184.15/10.55.184.0/255.255.255.0/bondeth0, hosting node node1


bondib1 is on bondib1_0 and fails over to bondib1_1:

# ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
ipmpapp0    ipmpapp0    ok        --        ipmpapp_0 (ipmpapp_1)
bondeth0    bondeth0    degraded  --        net2 [net5]
bondib1     bondib1     ok        --        bondib1_0 (bondib1_1)
bondib0     bondib0     ok        --        bondib0_0 (bondib0_1)


bondib1_0 goes over net24

# dladm show-link | grep bond
LINK                CLASS     MTU    STATE    OVER
bondib0_0           part      65520  up       net21
bondib0_1           part      65520  up       net22
bondib1_0           part      65520  up       net24
bondib1_1           part      65520  up       net23


net24 is IB Partition FFFF

# dladm show-ib
LINK         HCAGUID         PORTGUID        PORT STATE  PKEYS
net24        21280001A1868A  21280001A1868C  2    up     FFFF
net22        21280001CEBBDE  21280001CEBBE0  2    up     FFFF,8503
net23        21280001A1868A  21280001A1868B  1    up     FFFF,8503
net21        21280001CEBBDE  21280001CEBBDF  1    up     FFFF


On Express Module 9 port 2:

# dladm show-phys -L
LINK              DEVICE       LOC
net21             ibp4         PCI-EM1/PORT1
net22             ibp5         PCI-EM1/PORT2
net23             ibp6         PCI-EM9/PORT1
net24             ibp7         PCI-EM9/PORT2


Outbound traffic on the 192.168.10.0 network will be multiplexed between bondib0 & bondib1

# netstat -rn

Routing Table: IPv4
  Destination           Gateway           Flags  Ref     Use     Interface
-------------------- -------------------- ----- ----- ---------- ---------
192.168.10.0         192.168.10.2         U        16    6551834 bondib1  
192.168.10.0         192.168.10.1         U         9    5708924 bondib0  


There is a lot more traffic on bondib0 than bondib1

# /bin/time snoop -I bondib0 -c 100 > /dev/null
Using device ipnet/bondib0 (promiscuous mode)
100 packets captured

real        4.3
user        0.0
sys         0.0


(100 packets in 4.3 seconds = 23.3 pkts/sec)

# /bin/time snoop -I bondib1 -c 100 > /dev/null
Using device ipnet/bondib1 (promiscuous mode)
100 packets captured

real       13.3
user        0.0
sys         0.0


(100 packets in 13.3 seconds = 7.5 pkts/sec)

Half of the packets on bondib0 are outbound (from self). The remaining packet are split evenly, from the other nodes in the cluster.

# snoop -I bondib0 -c 100 | awk '{print $1}' | sort | uniq -c
Using device ipnet/bondib0 (promiscuous mode)
100 packets captured
  49 node1
-priv1.us.oracle.com
  24 node2
-priv1.us.oracle.com
  14 node3
-priv1.us.oracle.com
  13 node4
-priv1.us.oracle.com

100% of the packets on bondib1 are outbound (from self), but the headers in the packets indicate that they are from the IP address associated with bondib0:

# snoop -I bondib1 -c 100 | awk '{print $1}' | sort | uniq -c
Using device ipnet/bondib1 (promiscuous mode)
100 packets captured
 100 node1-priv1.us.oracle.com

The destination of the bondib1 outbound packets are split evenly, to node3 and node 4.

# snoop -I bondib1 -c 100 | awk '{print $3}' | sort | uniq -c
Using device ipnet/bondib1 (promiscuous mode)
100 packets captured
  51 node3-priv1.us.oracle.com
  49 node4-priv1.us.oracle.com

Conclusion: bondib1 is currently only being used for outbound cluster interconnect interconnect traffic.

© Oracle Blogs or respective owner

Related posts about /Sun