Cluster Health Monitor(????CHM)???Oracle?????,?????????????(CPU????SWAP????I/O?????)??????CHM?????????? ??????????????????????Hang?????(Eviction)????????????????,??????CHM????????????????????,????????????? CHM???????????: 11.2.0.2 ?????? Oracle Grid Infrastructure for Linux (???Linux Itanium) ?Solaris (Sparc 64 ? x86-64) 11.2.0.3 ????? Oracle Grid Infrastructure for AIX ? Windows (???Windows Itanium)? ????,???????????CHM?????(ora.crf)???: $ crsctl stat res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS Cluster Resources ora.crf ONLINE ONLINE rac1 CHM????????: 1). System Monitor Service(osysmond):?????????????,osysmond????????????????cluster logger service,???????????????????CHM????? $ ps -ef|grep osysmond root 7984 1 0 Jun05 ? 01:16:14 /u01/app/11.2.0/grid/bin/osysmond.bin 2). Cluster Logger Service(ologgerd):???????,ologgerd ???????(master),???????(standby)??ologgerd???????????????,?????????? ???: $ ps -ef|grep ologgerd root 8257 1 0 Jun05 ? 00:38:26 /u01/app/11.2.0/grid/bin/ologgerd -M -d /u01/app/11.2.0/grid/crf/db/rac2 ???: $ ps -ef|grep ologgerd root 8353 1 0 Jun05 ? 00:18:47 /u01/app/11.2.0/grid/bin/ologgerd -m rac2 -r -d /u01/app/11.2.0/grid/crf/db/rac1 CHM Repository:?????????,?????,????Grid Infrastructure home ? ,??1 GB ?????,???????????0.5GB???? ?????OCLUMON??????????????????(??????3????)? ??????????????: $ oclumon manage -get reppath CHM Repository Path = /u01/app/11.2.0/grid/crf/db/rac2 Done $ oclumon manage -get repsize CHM Repository Size = 68082 <====???? Done ????: $ oclumon manage -repos reploc /shared/oracle/chm ????: $ oclumon manage -repos resize 68083 <==?3600(??) ? 259200(3?)?? rac1 --> retention check successful New retention is 68083 and will use 1073750609 bytes of disk space CRS-9115-Cluster Health Monitor repository size change completed on all nodes. Done ??CHM???????????: 1. ?????Grid_home/bin/diagcollection.pl: 1). ??,??cluster logger service????: $ oclumon manage -get master Master = rac2
2).?root??????rac2???????: # <Grid_home>/bin/diagcollection.pl -collect -chmos -incidenttime inc_time -incidentduration duration inc_time?????????????,???MM/DD/YYYY24HH:MM:SS, duration?????????????????? ??:# diagcollection.pl -collect -crshome /u01/app/11.2.0/grid -chmoshome /u01/app/11.2.0/grid -chmos -incidenttime 06/15/201215:30:00 -incidentduration 00:05 3).????????,CHM?????????chmosData_rac2_20120615_1537.tar.gz? 2. ??????CHM?????????oclumon: $oclumon dumpnodeview [[-allnodes] | [-n node1 node2] [-last "duration"] | [-s "time_stamp" -e "time_stamp"] [-v] [-warning]] [-h] -s??????,-e?????? $ oclumon dumpnodeview -allnodes -v -s "2012-06-15 07:40:00" -e "2012-06-15 07:57:00" > /tmp/chm1.txt $ oclumon dumpnodeview -n node1 node2 node3 -last "12:00:00" >/tmp/chm1.txt $ oclumon dumpnodeview -allnodes -last "00:15:00" >/tmp/chm1.txt ???/tmp/chm1.txt??????:----------------------------------------Node: rac1 Clock: '06-15-12 07.40.01' SerialNo:168880----------------------------------------SYSTEM:#cpus: 1 cpu: 17.96 cpuq: 5 physmemfree: 32240 physmemtotal: 2065856 mcache: 1064024 swapfree: 3988376 swaptotal: 4192956 ior: 57 iow: 59 ios: 10 swpin: 0 swpout: 0 pgin: 57 pgout: 59 netr: 65.767 netw: 34.871 procs: 183 rtprocs: 10 #fds: 4902 #sysfdlimit: 6815744 #disks: 4 #nics: 3 nicErrors: 0TOP CONSUMERS:topcpu: 'mrtg(32385) 64.70' topprivmem: 'ologgerd(8353) 84068' topshm: 'oracle(8760) 329452' topfd: 'ohasd.bin(6627) 720' topthread: 'crsd.bin(8235) 44'PROCESSES:name: 'mrtg' pid: 32385 #procfdlimit: 65536 cpuusage: 64.70 privmem: 1160 shm: 1584 #fd: 5 #threads: 1 priority: 20 nice: 0name: 'oracle' pid: 32381 #procfdlimit: 65536 cpuusage: 0.29 privmem: 1456 shm: 12444 #fd: 32 #threads: 1 priority: 15 nice: 0...name: 'oracle' pid: 8756 #procfdlimit: 65536 cpuusage: 0.0 privmem: 2892 shm: 24356 #fd: 47 #threads: 1 priority: 16 nice: 0----------------------------------------Node: rac2 Clock: '06-15-12 07.40.02' SerialNo:168878----------------------------------------SYSTEM:#cpus: 1 cpu: 40.72 cpuq: 8 physmemfree: 34072 physmemtotal: 2065856 mcache: 1005636 swapfree: 3991808 swaptotal: 4192956 ior: 54 iow: 104 ios: 11 swpin: 0 swpout: 0 pgin: 54 pgout: 104 netr: 77.817 netw: 33.008 procs: 178 rtprocs: 10 #fds: 4948 #sysfdlimit: 6815744 #disks: 4 #nics: 4 nicErrors: 0TOP CONSUMERS:topcpu: 'orarootagent.bi(8490) 1.59' topprivmem: 'ologgerd(8257) 83108' topshm: 'oracle(8873) 324868' topfd: 'ohasd.bin(6744) 720' topthread: 'crsd.bin(8362) 47'PROCESSES:name: 'oracle' pid: 9040 #procfdlimit: 65536 cpuusage: 0.19 privmem: 6040 shm: 121712 #fd: 33 #threads: 1 priority: 16 nice: 0... ??CHM?????,???Oracle????: http://docs.oracle.com/cd/E11882_01/rac.112/e16794/troubleshoot.htm#CWADD92242 Oracle® Clusterware Administration and Deployment Guide 11g Release 2 (11.2) Part Number E16794-17 ?? My Oracle Support??: Cluster Health Monitor (CHM) FAQ (Doc ID 1328466.1)