Rac cluster, one server's instances abruptly evicted and re-joined the cluster after 10 minutes. Below is from DB's alert log:
Wed Jun 27 00:34:55 2012
Errors in file /q001/product/ora_base/diag/rdbms/ttpoltq/ttpoltq1/trace/ttpoltq1_ora_21759.trc (incident=139278):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 169.254.25.210 not found. Check output from ifconfig command
.............
ORA-29740: evicted by instance number 3, group incarnation 60
LMON (ospid: 20361): terminating the instance due to error 29740
Instance terminated by LMON, pid = 20361
From red part we know there must be something wrong with interconnect.
After checking found interfaces on 10.0.0.* network were working fine.
Also GPNP logfile shew no abnormal information.
Then the issue must be caused by HAIP.
Let's check CSSD's logfile:
oracle $ cat ocssd.log|egrep -i 'haip|disk' 2012-06-27 00:31:44.417: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 7910 msecs 2012-06-27 00:31:52.433: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 15920 msecs 2012-06-27 00:32:00.449: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 23940 msecs 2012-06-27 00:32:08.170: [ CSSD][1105717568]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 31660 msecs 2012-06-27 00:32:16.481: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 39970 msecs 2012-06-27 00:32:24.497: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 47980 msecs 2012-06-27 00:32:32.513: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 56000 msecs 2012-06-27 00:32:40.945: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 64430 msecs 2012-06-27 00:32:49.236: [ CSSD][1105717568]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 72720 msecs 2012-06-27 00:32:58.565: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 82040 msecs 2012-06-27 00:33:06.581: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 90060 msecs 2012-06-27 00:33:14.597: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 98070 msecs 2012-06-27 00:33:15.539: [ CSSD][1109621056]clssnmvDiskCheck: (ORCL:ASM_DISK01) No I/O completed after 50% maximum time, 200000 ms, will be considered unusable in 99990 ms 2012-06-27 00:33:22.614: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 106090 msecs 2012-06-27 00:33:30.629: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 114100 msecs 2012-06-27 00:33:34.069: [ CSSD][1105717568]clssgmFenceClient: fencing client (0x19278b0), member 1 in group haip.cluster_interconnect, no share, death fence 1, SAGE fence 0 2012-06-27 00:33:34.069: [ CSSD][1105717568]clssgmUnreferenceMember: global grock haip.cluster_interconnect member 1 refcount is 1 2012-06-27 00:33:34.069: [ CSSD][1093089600]clssgmTermMember: Terminating member 1 (0x18afc30) in grock haip.cluster_interconnect 2012-06-27 00:33:34.069: [ CSSD][1093089600]clssgmUnreferenceMember: global grock haip.cluster_interconnect member 1 refcount is 0 2012-06-27 00:33:34.070: [ CSSD][1111198016]clssgmRemoveMember: grock haip.cluster_interconnect, member number 1 (0x18afc30) node number 1 state 0x0 grock type 2 2012-06-27 00:33:34.070: [ CSSD][1111198016]clssgmGrantLocks: 1-> new master (3/3) group haip.cluster_interconnect 2012-06-27 00:33:34.072: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x6c30034) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(2) 2012-06-27 00:33:34.072: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x6c30034) of grock(haip.cluster_interconnect) received all acks, grock update sequence(109) 2012-06-27 00:33:34.242: [ CSSD][1105717568]clssgmJoinGrock: global grock haip.cluster_interconnect new client 0x2aaab0611a70 with con 0x12622491, requested num 1, flags 0x4000100 2012-06-27 00:33:34.242: [ CSSD][1105717568]clssgmAddGrockMember: adding member to grock haip.cluster_interconnect 2012-06-27 00:33:34.242: [ CSSD][1111198016]clssgmAddMember: Adding fencing for member 1, group haip.cluster_interconnect, death 1, SAGE 0 2012-06-27 00:33:34.242: [ CSSD][1111198016]clssgmAddMember: member (1/0x2aaab051c000) added. pbsz(0) prsz(0) flags 0x0 to grock (0x2aaab00e3190/haip.cluster_interconnect) 2012-06-27 00:33:34.242: [ CSSD][1111198016]clssgmCommonAddMember: global group grock haip.cluster_interconnect member(1/Local) node(1) flags 0x0 0x423b6300 2012-06-27 00:33:34.245: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x6c70034) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(2) 2012-06-27 00:33:34.245: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x6c70034) of grock(haip.cluster_interconnect) received all acks, grock update sequence(110) 2012-06-27 00:33:34.562: [ CSSD][1105717568]clssgmExitGrock: client 1 (0x2aaab0611a70), grock haip.cluster_interconnect, member 1 2012-06-27 00:33:34.562: [ CSSD][1105717568]clssgmUnregisterPrimary: Unregistering member 1 (0x2aaab051c000) in global grock haip.cluster_interconnect 2012-06-27 00:33:34.562: [ CSSD][1105717568]clssgmUnreferenceMember: global grock haip.cluster_interconnect member 1 refcount is 0 2012-06-27 00:33:34.562: [ CSSD][1111198016]clssgmRemoveMember: grock haip.cluster_interconnect, member number 1 (0x2aaab051c000) node number 1 state 0x4 grock type 2 2012-06-27 00:33:34.564: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x6c80034) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(2) 2012-06-27 00:33:34.565: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x6c80034) of grock(haip.cluster_interconnect) received all acks, grock update sequence(111) 2012-06-27 00:33:34.627: [ CSSD][1105717568]clssgmJoinGrock: global grock haip.cluster_interconnect new client 0x2aaab04022b0 with con 0x1262256c, requested num 1, flags 0x4000100 2012-06-27 00:33:34.627: [ CSSD][1105717568]clssgmAddGrockMember: adding member to grock haip.cluster_interconnect 2012-06-27 00:33:34.627: [ CSSD][1111198016]clssgmAddMember: Adding fencing for member 1, group haip.cluster_interconnect, death 1, SAGE 0 2012-06-27 00:33:34.627: [ CSSD][1111198016]clssgmAddMember: member (1/0x2aaab04c0f80) added. pbsz(0) prsz(0) flags 0x0 to grock (0x2aaab00e3190/haip.cluster_interconnect) 2012-06-27 00:33:34.627: [ CSSD][1111198016]clssgmCommonAddMember: global group grock haip.cluster_interconnect member(1/Local) node(1) flags 0x0 0x423b6300 2012-06-27 00:33:34.629: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x6cb0034) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(2) 2012-06-27 00:33:34.630: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x6cb0034) of grock(haip.cluster_interconnect) received all acks, grock update sequence(112) 2012-06-27 00:33:39.647: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 123120 msecs 2012-06-27 00:33:47.663: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 131130 msecs 2012-06-27 00:33:55.680: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 139150 msecs 2012-06-27 00:34:04.115: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 147580 msecs 2012-06-27 00:34:05.658: [ CSSD][1109621056]clssnmvDiskCheck: (ORCL:ASM_DISK01) No I/O completed after 75% maximum time, 200000 ms, will be considered unusable in 49880 ms 2012-06-27 00:34:11.712: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 155180 msecs 2012-06-27 00:34:19.728: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 163190 msecs 2012-06-27 00:34:28.314: [ CSSD][1105717568]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 171770 msecs 2012-06-27 00:34:35.732: [ CSSD][1109621056]clssnmvDiskCheck: (ORCL:ASM_DISK01) No I/O completed after 90% maximum time, 200000 ms, will be considered unusable in 19810 ms 2012-06-27 00:34:36.762: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 180220 msecs 2012-06-27 00:34:43.044: [ CSSD][1111198016]clssgmRemoveMember: grock haip.cluster_interconnect, member number 3 (0x2aaaac27b800) node number 3 state 0x0 grock type 2 2012-06-27 00:34:43.044: [ CSSD][1111198016]clssgmGrantLocks: 3-> new master (2/2) group haip.cluster_interconnect 2012-06-27 00:34:43.046: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x7200034) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(2) 2012-06-27 00:34:43.047: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x7200034) of grock(haip.cluster_interconnect) received all acks, grock update sequence(113) 2012-06-27 00:34:43.291: [ CSSD][1111198016]clssgmAddMember: member (3/0x2aaaac5844a0) added. pbsz(0) prsz(0) flags 0x0 to grock (0x2aaab00e3190/haip.cluster_interconnect) 2012-06-27 00:34:43.291: [ CSSD][1111198016]clssgmCommonAddMember: global group grock haip.cluster_interconnect member(3/Remote) node(3) flags 0x0 0xac5844a0 2012-06-27 00:34:43.293: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x7400034) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(2) 2012-06-27 00:34:43.293: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x7400034) of grock(haip.cluster_interconnect) received all acks, grock update sequence(114) 2012-06-27 00:34:43.636: [ CSSD][1111198016]clssgmRemoveMember: grock haip.cluster_interconnect, member number 3 (0x2aaaac5844a0) node number 3 state 0x0 grock type 2 2012-06-27 00:34:43.638: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x74f0034) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(3) 2012-06-27 00:34:43.639: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x74f0034) of grock(haip.cluster_interconnect) received all acks, grock update sequence(115) 2012-06-27 00:34:43.730: [ CSSD][1111198016]clssgmAddMember: member (3/0x1dba9d0) added. pbsz(0) prsz(0) flags 0x0 to grock (0x2aaab00e3190/haip.cluster_interconnect) 2012-06-27 00:34:43.731: [ CSSD][1111198016]clssgmCommonAddMember: global group grock haip.cluster_interconnect member(3/Remote) node(3) flags 0x0 0x1dba9d0 2012-06-27 00:34:43.733: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x74a0034) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(2) 2012-06-27 00:34:43.733: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x74a0034) of grock(haip.cluster_interconnect) received all acks, grock update sequence(116) 2012-06-27 00:34:45.457: [ CSSD][1105717568]clssgmExitGrock: client 1 (0x2aaab04022b0), grock haip.cluster_interconnect, member 1 2012-06-27 00:34:45.457: [ CSSD][1105717568]clssgmUnregisterPrimary: Unregistering member 1 (0x2aaab04c0f80) in global grock haip.cluster_interconnect 2012-06-27 00:34:45.457: [ CSSD][1105717568]clssgmUnreferenceMember: global grock haip.cluster_interconnect member 1 refcount is 0 2012-06-27 00:34:45.457: [ CSSD][1111198016]clssgmRemoveMember: grock haip.cluster_interconnect, member number 1 (0x2aaab04c0f80) node number 1 state 0x4 grock type 2 2012-06-27 00:34:45.459: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x7640034) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(2) 2012-06-27 00:34:45.459: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x7640034) of grock(haip.cluster_interconnect) received all acks, grock update sequence(117) 2012-06-27 00:43:28.566: [ CSSD][1105717568]clssgmJoinGrock: global grock haip.cluster_interconnect new client 0x2aaab43103e0 with con 0x126435d1, requested num 1, flags 0x4000100 2012-06-27 00:43:28.566: [ CSSD][1105717568]clssgmAddGrockMember: adding member to grock haip.cluster_interconnect 2012-06-27 00:43:28.566: [ CSSD][1111198016]clssgmAddMember: Adding fencing for member 1, group haip.cluster_interconnect, death 1, SAGE 0 2012-06-27 00:43:28.566: [ CSSD][1111198016]clssgmAddMember: member (1/0x2aaab46fbd20) added. pbsz(0) prsz(0) flags 0x0 to grock (0x2aaab00e3190/haip.cluster_interconnect) 2012-06-27 00:43:28.566: [ CSSD][1111198016]clssgmCommonAddMember: global group grock haip.cluster_interconnect member(1/Local) node(1) flags 0x0 0x423b6300 2012-06-27 00:43:28.568: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x2aa0035) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(2) 2012-06-27 00:43:28.569: [ CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x2aa0035) of grock(haip.cluster_interconnect) received all acks, grock update sequence(118) 2012-06-27 00:43:44.470: [ CSSD][1105717568]clssgmExecuteClientRequest: VOTEDISKQUERY recvd from proc 32 (0x2aaab4269a70) 2012-06-27 00:43:45.958: [ CSSD][1105717568]clssgmExecuteClientRequest: VOTEDISKQUERY recvd from proc 32 (0x2aaab4269a70) 2012-06-27 00:43:48.770: [ CSSD][1105717568]clssgmExecuteClientRequest: VOTEDISKQUERY recvd from proc 34 (0x2aaab05c4340) 2012-06-27 00:43:48.771: [ CSSD][1105717568]clssgmExecuteClientRequest: VOTEDISKQUERY recvd from proc 34 (0x2aaab05c4340)
2012-06-27 00:31:44.417: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 7910 msecs
The long disk timeout threshold is 200 seconds in 11gR2.
During the countdown process, the cluster had a check on HAIP and belive HAIP already died:
2012-06-27 00:33:34.069: [ CSSD][1105717568]clssgmFenceClient: fencing client (0x19278b0), member 1 in group haip.cluster_interconnect, no share, death fence 1, SAGE fence 0
Cluster start to cleanup HAIP, and tried mutiple times to restart HAIP.
And later, the votedisk is online again, so votedisk timeout countdown stopped, and HAIP also re-brought up on-line, and all instances also be brought up, and all things back to normal then.
0 Comments:
Post a Comment