??????????
- by Allen Gao
???????RAC??????????????????10gR2?11gR1.??????????????CRS???????1.ocssd : ???????????(Node Monitoring)????(Group Management),??CRS????????????????????????,????????????(network heartbeat)?????(disk heartbeat)???,?????????????????????,????????????,??????????????????,?????node kill escalation(???11gR1????????),??????????????????????????(reboot time,???3?)????????:ocssd.bin??????????????????????????,??????????????????????????????,misscount(???30?,??????????????600?),????????????,???????????????????,?????????????2???,??????,??????????????,?????????????????????:ocssd.bin?????????????(Voting File)??????????,?????????????????????????????,disk timeou(???200?),?????????????????????,CRS???[N/2]+1????????,??N??????,??????2.oclsomon:????????ocssd????,????ocssd.bin??????,???????3.oprocd:??????Linux?Unix??,????????????????????????????????,?????????:????????????init.cssd?????????????????????????1.??????2.<crs???>/log/<????>/cssd/ocssd.log3.oprocd.log(/etc/oracle/oprocd/*.log.* ? /var/opt/oracle/oprocd/*.log.*)4.<crs???>/log/<????>/cssd/oclsomon/oclsomon.log5. Oracle OSWatcher ????????????????????1.?ocssd???????????ocssd.log???????,????????????????????????????????,???????,OSW??(traceroute???),???????(cluster interconnect)??????,?????????[ CSSD]2012-03-02 23:56:18.749 [3086] >WARNING: clssnmPollingThread: node <node_name> at 50% heartbeat fatal, eviction in 14.494 seconds[ CSSD]2012-03-02 23:56:25.749 [3086] >WARNING: clssnmPollingThread: node <node_name> at 75% heartbeat fatal, eviction in 7.494 seconds[ CSSD]2012-03-02 23:56:32.749 [3086] >WARNING: clssnmPollingThread: node <node_name>at 90% heartbeat fatal, eviction in 0.494 seconds[CSSD]2012-03-02 23:56:33.243 [3086] >TRACE: clssnmPollingThread: Eviction started for node <node_name>, flags 0x040d, state 3, wt4c 0[CSSD]2012-03-02 23:56:33.243 [3086] >TRACE: clssnmDiscHelper: <node_name>, node(4) connection failed, con (1128a5530), probe(0)[CSSD]2012-03-02 23:56:33.243 [3086] >TRACE: clssnmDiscHelper: node 4 clean up, con (1128a5530), init state 5, cur state 5[CSSD]2012-03-02 23:56:33.243 [3600] >TRACE: clssnmDoSyncUpdate: Initiating sync 196446491[CSSD]2012-03-02 23:56:33.243 [3600] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (27000)ms??:???????ocssd.log?????????????????????,?????????????????????ocssd.log???????,??????????????????????????????,OSWatcher??(iostat???),???i/o????????,?????????2010-08-13 18:34:37.423: [ CSSD][150477728]clssnmvDiskOpen: Opening /dev/sdb82010-08-13 18:34:37.423: [ CLSF][150477728]Opened hdl:0xf4336530 for dev:/dev/sdb8:2010-08-13 18:34:37.429: [ SKGFD][150477728]ERROR: -9(Error 27072, OS Error (Linux Error: 5: Input/output errorAdditional information: 4Additional information: 720913Additional information: -1))2010-08-13 18:34:37.429: [ CSSD][150477728](:CSSNM00060: )clssnmvReadBlocks: read failed at offset 17 of /dev/sdb82010-08-13 18:34:38.205: [ CSSD][4110736288](:CSSNM00058: )clssnmvDiskCheck: No I/O completions for 200880 ms for voting file /dev/sdb8)2010-08-13 18:34:38.206: [ CSSD][4110736288](:CSSNM00018: )clssnmvDiskCheck: Aborting, 0 of 1 configured voting disks available, need 12010-08-13 18:34:38.206: [ CSSD][4110736288]###################################2010-08-13 18:34:38.206: [ CSSD][4110736288]clssscExit: CSSD aborting from thread clssnmvDiskPingMonitorThread 2010-08-13 18:34:38.206: [ CSSD][4110736288]###################################2. ?oclsomon???????????oclsomon.log ?????,??????????ocssd????,??ocssd??????(RT)???,?????????????(?cpu)??,?????????????,OSW??(vmstat,top???),?????????3.?oprocd???????????oprocd?????????,?????????oprocd????? Dec 21 16:15:30.369857 | LASTGASP | AlarmHandler: timeout(2312 msec) exceeds interval(1000 msec)+margin(500 msec). Rebooting NOW.??oprocd?????????????????????,?????ntp(?????????),??diagwait=13 ????????,??,?????????????,??????CRS,???????????????,??????????????oprocd????,??,?????OSWatcher??(vmstat,top???),??????????????????????????????,?????????????????
???????,??????MOS ???Note 265769.1 :Troubleshooting 10g and 11.1 Clusterware RebootsNote 1050693.1 :Troubleshooting 11.2 Clusterware Node Evictions (Reboots)