July 5, 2012

Instance evicted due to HAIP failure

Rac cluster, one server's instances abruptly evicted and re-joined the cluster after 10 minutes. Below is from DB's alert log:

Wed Jun 27 00:34:55 2012
Errors in file /q001/product/ora_base/diag/rdbms/ttpoltq/ttpoltq1/trace/ttpoltq1_ora_21759.trc  (incident=139278):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 169.254.25.210 not found. Check output from ifconfig command
.............
ORA-29740: evicted by instance number 3, group incarnation 60
LMON (ospid: 20361): terminating the instance due to error 29740
Instance terminated by LMON, pid = 20361

From red part we know there must be something wrong with interconnect.

After checking found interfaces on 10.0.0.* network were working fine.
Also GPNP logfile shew no abnormal information.

Then the issue must be caused by HAIP.
Let's check CSSD's logfile:

oracle $ cat ocssd.log|egrep -i 'haip|disk'
2012-06-27 00:31:44.417: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 7910 msecs
2012-06-27 00:31:52.433: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 15920 msecs
2012-06-27 00:32:00.449: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 23940 msecs
2012-06-27 00:32:08.170: [    CSSD][1105717568]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 31660 msecs
2012-06-27 00:32:16.481: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 39970 msecs
2012-06-27 00:32:24.497: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 47980 msecs
2012-06-27 00:32:32.513: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 56000 msecs
2012-06-27 00:32:40.945: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 64430 msecs
2012-06-27 00:32:49.236: [    CSSD][1105717568]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 72720 msecs
2012-06-27 00:32:58.565: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 82040 msecs
2012-06-27 00:33:06.581: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 90060 msecs
2012-06-27 00:33:14.597: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 98070 msecs
2012-06-27 00:33:15.539: [    CSSD][1109621056]clssnmvDiskCheck: (ORCL:ASM_DISK01) No I/O completed after 50% maximum time, 200000 ms, will be considered unusable in 99990 ms
2012-06-27 00:33:22.614: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 106090 msecs
2012-06-27 00:33:30.629: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 114100 msecs
2012-06-27 00:33:34.069: [    CSSD][1105717568]clssgmFenceClient: fencing client (0x19278b0), member 1 in group haip.cluster_interconnect, no share, death fence 1, SAGE fence 0
2012-06-27 00:33:34.069: [    CSSD][1105717568]clssgmUnreferenceMember: global grock haip.cluster_interconnect member 1 refcount is 1
2012-06-27 00:33:34.069: [    CSSD][1093089600]clssgmTermMember: Terminating member 1 (0x18afc30) in grock haip.cluster_interconnect
2012-06-27 00:33:34.069: [    CSSD][1093089600]clssgmUnreferenceMember: global grock haip.cluster_interconnect member 1 refcount is 0
2012-06-27 00:33:34.070: [    CSSD][1111198016]clssgmRemoveMember: grock haip.cluster_interconnect, member number 1 (0x18afc30) node number 1 state 0x0 grock type 2
2012-06-27 00:33:34.070: [    CSSD][1111198016]clssgmGrantLocks: 1-> new master (3/3) group haip.cluster_interconnect
2012-06-27 00:33:34.072: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x6c30034) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(2)
2012-06-27 00:33:34.072: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x6c30034) of grock(haip.cluster_interconnect) received all acks, grock update sequence(109)
2012-06-27 00:33:34.242: [    CSSD][1105717568]clssgmJoinGrock: global grock haip.cluster_interconnect new client 0x2aaab0611a70 with con 0x12622491, requested num 1, flags 0x4000100
2012-06-27 00:33:34.242: [    CSSD][1105717568]clssgmAddGrockMember: adding member to grock haip.cluster_interconnect
2012-06-27 00:33:34.242: [    CSSD][1111198016]clssgmAddMember: Adding fencing for member 1, group haip.cluster_interconnect, death 1, SAGE 0
2012-06-27 00:33:34.242: [    CSSD][1111198016]clssgmAddMember: member (1/0x2aaab051c000) added. pbsz(0) prsz(0) flags 0x0 to grock (0x2aaab00e3190/haip.cluster_interconnect)
2012-06-27 00:33:34.242: [    CSSD][1111198016]clssgmCommonAddMember: global group grock haip.cluster_interconnect member(1/Local) node(1) flags 0x0 0x423b6300
2012-06-27 00:33:34.245: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x6c70034) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(2)
2012-06-27 00:33:34.245: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x6c70034) of grock(haip.cluster_interconnect) received all acks, grock update sequence(110)
2012-06-27 00:33:34.562: [    CSSD][1105717568]clssgmExitGrock: client 1 (0x2aaab0611a70), grock haip.cluster_interconnect, member 1
2012-06-27 00:33:34.562: [    CSSD][1105717568]clssgmUnregisterPrimary: Unregistering member 1 (0x2aaab051c000) in global grock haip.cluster_interconnect
2012-06-27 00:33:34.562: [    CSSD][1105717568]clssgmUnreferenceMember: global grock haip.cluster_interconnect member 1 refcount is 0
2012-06-27 00:33:34.562: [    CSSD][1111198016]clssgmRemoveMember: grock haip.cluster_interconnect, member number 1 (0x2aaab051c000) node number 1 state 0x4 grock type 2
2012-06-27 00:33:34.564: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x6c80034) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(2)
2012-06-27 00:33:34.565: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x6c80034) of grock(haip.cluster_interconnect) received all acks, grock update sequence(111)
2012-06-27 00:33:34.627: [    CSSD][1105717568]clssgmJoinGrock: global grock haip.cluster_interconnect new client 0x2aaab04022b0 with con 0x1262256c, requested num 1, flags 0x4000100
2012-06-27 00:33:34.627: [    CSSD][1105717568]clssgmAddGrockMember: adding member to grock haip.cluster_interconnect
2012-06-27 00:33:34.627: [    CSSD][1111198016]clssgmAddMember: Adding fencing for member 1, group haip.cluster_interconnect, death 1, SAGE 0
2012-06-27 00:33:34.627: [    CSSD][1111198016]clssgmAddMember: member (1/0x2aaab04c0f80) added. pbsz(0) prsz(0) flags 0x0 to grock (0x2aaab00e3190/haip.cluster_interconnect)
2012-06-27 00:33:34.627: [    CSSD][1111198016]clssgmCommonAddMember: global group grock haip.cluster_interconnect member(1/Local) node(1) flags 0x0 0x423b6300
2012-06-27 00:33:34.629: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x6cb0034) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(2)
2012-06-27 00:33:34.630: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x6cb0034) of grock(haip.cluster_interconnect) received all acks, grock update sequence(112)
2012-06-27 00:33:39.647: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 123120 msecs
2012-06-27 00:33:47.663: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 131130 msecs
2012-06-27 00:33:55.680: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 139150 msecs
2012-06-27 00:34:04.115: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 147580 msecs
2012-06-27 00:34:05.658: [    CSSD][1109621056]clssnmvDiskCheck: (ORCL:ASM_DISK01) No I/O completed after 75% maximum time, 200000 ms, will be considered unusable in 49880 ms
2012-06-27 00:34:11.712: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 155180 msecs
2012-06-27 00:34:19.728: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 163190 msecs
2012-06-27 00:34:28.314: [    CSSD][1105717568]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 171770 msecs
2012-06-27 00:34:35.732: [    CSSD][1109621056]clssnmvDiskCheck: (ORCL:ASM_DISK01) No I/O completed after 90% maximum time, 200000 ms, will be considered unusable in 19810 ms
2012-06-27 00:34:36.762: [    CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 180220 msecs
2012-06-27 00:34:43.044: [    CSSD][1111198016]clssgmRemoveMember: grock haip.cluster_interconnect, member number 3 (0x2aaaac27b800) node number 3 state 0x0 grock type 2
2012-06-27 00:34:43.044: [    CSSD][1111198016]clssgmGrantLocks: 3-> new master (2/2) group haip.cluster_interconnect
2012-06-27 00:34:43.046: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x7200034) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(2)
2012-06-27 00:34:43.047: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x7200034) of grock(haip.cluster_interconnect) received all acks, grock update sequence(113)
2012-06-27 00:34:43.291: [    CSSD][1111198016]clssgmAddMember: member (3/0x2aaaac5844a0) added. pbsz(0) prsz(0) flags 0x0 to grock (0x2aaab00e3190/haip.cluster_interconnect)
2012-06-27 00:34:43.291: [    CSSD][1111198016]clssgmCommonAddMember: global group grock haip.cluster_interconnect member(3/Remote) node(3) flags 0x0 0xac5844a0
2012-06-27 00:34:43.293: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x7400034) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(2)
2012-06-27 00:34:43.293: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x7400034) of grock(haip.cluster_interconnect) received all acks, grock update sequence(114)
2012-06-27 00:34:43.636: [    CSSD][1111198016]clssgmRemoveMember: grock haip.cluster_interconnect, member number 3 (0x2aaaac5844a0) node number 3 state 0x0 grock type 2
2012-06-27 00:34:43.638: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x74f0034) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(3)
2012-06-27 00:34:43.639: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x74f0034) of grock(haip.cluster_interconnect) received all acks, grock update sequence(115)
2012-06-27 00:34:43.730: [    CSSD][1111198016]clssgmAddMember: member (3/0x1dba9d0) added. pbsz(0) prsz(0) flags 0x0 to grock (0x2aaab00e3190/haip.cluster_interconnect)
2012-06-27 00:34:43.731: [    CSSD][1111198016]clssgmCommonAddMember: global group grock haip.cluster_interconnect member(3/Remote) node(3) flags 0x0 0x1dba9d0
2012-06-27 00:34:43.733: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x74a0034) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(2)
2012-06-27 00:34:43.733: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x74a0034) of grock(haip.cluster_interconnect) received all acks, grock update sequence(116)
2012-06-27 00:34:45.457: [    CSSD][1105717568]clssgmExitGrock: client 1 (0x2aaab04022b0), grock haip.cluster_interconnect, member 1
2012-06-27 00:34:45.457: [    CSSD][1105717568]clssgmUnregisterPrimary: Unregistering member 1 (0x2aaab04c0f80) in global grock haip.cluster_interconnect
2012-06-27 00:34:45.457: [    CSSD][1105717568]clssgmUnreferenceMember: global grock haip.cluster_interconnect member 1 refcount is 0
2012-06-27 00:34:45.457: [    CSSD][1111198016]clssgmRemoveMember: grock haip.cluster_interconnect, member number 1 (0x2aaab04c0f80) node number 1 state 0x4 grock type 2
2012-06-27 00:34:45.459: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x7640034) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(2)
2012-06-27 00:34:45.459: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x7640034) of grock(haip.cluster_interconnect) received all acks, grock update sequence(117)
2012-06-27 00:43:28.566: [    CSSD][1105717568]clssgmJoinGrock: global grock haip.cluster_interconnect new client 0x2aaab43103e0 with con 0x126435d1, requested num 1, flags 0x4000100
2012-06-27 00:43:28.566: [    CSSD][1105717568]clssgmAddGrockMember: adding member to grock haip.cluster_interconnect
2012-06-27 00:43:28.566: [    CSSD][1111198016]clssgmAddMember: Adding fencing for member 1, group haip.cluster_interconnect, death 1, SAGE 0
2012-06-27 00:43:28.566: [    CSSD][1111198016]clssgmAddMember: member (1/0x2aaab46fbd20) added. pbsz(0) prsz(0) flags 0x0 to grock (0x2aaab00e3190/haip.cluster_interconnect)
2012-06-27 00:43:28.566: [    CSSD][1111198016]clssgmCommonAddMember: global group grock haip.cluster_interconnect member(1/Local) node(1) flags 0x0 0x423b6300
2012-06-27 00:43:28.568: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x2aa0035) of grock(haip.cluster_interconnect) is waiting for at least another ack from node(2)
2012-06-27 00:43:28.569: [    CSSD][1111198016]clssgmBroadcastGrockRcfgCmpl: RPC(0x2aa0035) of grock(haip.cluster_interconnect) received all acks, grock update sequence(118)
2012-06-27 00:43:44.470: [    CSSD][1105717568]clssgmExecuteClientRequest: VOTEDISKQUERY recvd from proc 32 (0x2aaab4269a70)
2012-06-27 00:43:45.958: [    CSSD][1105717568]clssgmExecuteClientRequest: VOTEDISKQUERY recvd from proc 32 (0x2aaab4269a70)
2012-06-27 00:43:48.770: [    CSSD][1105717568]clssgmExecuteClientRequest: VOTEDISKQUERY recvd from proc 34 (0x2aaab05c4340)
2012-06-27 00:43:48.771: [    CSSD][1105717568]clssgmExecuteClientRequest: VOTEDISKQUERY recvd from proc 34 (0x2aaab05c4340)
From above we can see the cluster started to countdown votedisk timeout:
2012-06-27 00:31:44.417: [ CSSD][1117505856]clssscMonitorThreads clssnmvDiskPingThread not scheduled for 7910 msecs

The long disk timeout threshold is 200 seconds in 11gR2.
During the countdown process, the cluster had a check on HAIP and belive HAIP already died:
2012-06-27 00:33:34.069: [ CSSD][1105717568]clssgmFenceClient: fencing client (0x19278b0), member 1 in group haip.cluster_interconnect, no share, death fence 1, SAGE fence 0

Cluster start to cleanup HAIP, and tried mutiple times to restart HAIP.

And later, the votedisk is online again, so votedisk timeout countdown stopped, and HAIP also re-brought up on-line, and all instances also be brought up, and all things back to normal then.

0 Comments:

Post a Comment