3 nodes Rac, in a PSU activity, the cluster on first node failed to startup after patching.
Below is the error in alertnode1.log:
2012-09-22 03:01:39.894
[ohasd(28563)]CRS-2112:The OLR service started on node cintrnpdb001.
2012-09-22 03:01:39.917
[ohasd(28563)]CRS-1301:Oracle High Availability Service started on node cintrnpdb001.
2012-09-22 03:01:39.918
[ohasd(28563)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred
2012-09-22 03:01:53.765
[gpnpd(28771)]CRS-2328:GPNPD started on node cintrnpdb001.
2012-09-22 03:01:56.228
[cssd(28831)]CRS-1713:CSSD daemon is started in clustered mode
2012-09-22 03:11:55.298
[/prod/product/oracle/11.2.0.2/grid/bin/cssdagent(28819)]CRS-5818:Aborted command 'start for resource: ora.cssd 1 1' for resource 'ora.cssd'. Details at (:CRSAGF00113:) {0:0:2} in /prod/product/oracle/11.2.0.2/grid/log/cintrnpdb001/agent/ohasd/oracssdagent_root/oracssdagent_root.log.
2012-09-22 03:11:55.299
[cssd(28831)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /prod/product/oracle/11.2.0.2/grid/log/cintrnpdb001/cssd/ocssd.log
2012-09-22 03:11:55.299
[cssd(28831)]CRS-1603:CSSD on node cintrnpdb001 shutdown by user.
During the whole time cssd.log keep throwing below error:
2012-09-22 03:00:00.546: [ GPNP][2692574496]clsgpnpm_newWiredMsg: [at clsgpnpm.c:741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http://www.grid-pnp.org/2005/12/gpnp-errors#"]
2012-09-22 03:00:02.558: [ GPNP][2692574496]clsgpnpm_newWiredMsg: [at clsgpnpm.c:741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http://www.grid-pnp.org/2005/12/gpnp-errors#"]
2012-09-22 03:00:04.568: [ GPNP][2692574496]clsgpnpm_newWiredMsg: [at clsgpnpm.c:741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http://www.grid-pnp.org/2005/12/gpnp-errors#"]
Above information show that cssd failed to get GPNPD profile.
Let's go to check gpnpd to see what's wrong:
2012-09-22 02:46:51.594: [ GPNP][1088584000]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2104] put-profile call to url "tcp://cintrnpdb003:33612" disco "mdns:service:gpnp._tcp.local.://cintrnpdb003:33612/agent=gpnpd,cname=crs_transp,host=cintrnpdb003,pid=20890/gpnpd h:cintrnpdb003 c:crs_transp" [f=0 claimed- host:cintrnpdb001 cname:crs_transp seq:6 auth:CN=GPnP_peer]
2012-09-22 02:47:05.424: [ OCRMSG][1085512000]GIPC error [29] msg [gipcretConnectionRefused]
2012-09-22 02:47:26.446: [ OCRMSG][1085512000]GIPC error [29] msg [gipcretConnectionRefused]
From above we can see when first node's gpnpd startup, it send a call to third node's gpnpd, but get no response.
So till now we can suspect that the issue is because gpnpd on third node have some issue.
After i kill gpnpd.bin on third node and let it re-start, the cluster on first node succeed startup immediately.
GE, AGFA. experienced in 9i/10g/11g/12c, AdvanceReplication, Stream, GoldenGate, RAC, DataGuard, ODA, EXADATA, GridControl. In my career I have always been assigned to handle most complicated and critical oracle issues without solution online, and I hope to record some of them via this blog.
September 22, 2012
cluster on first node failed to startup after reboot due to GPNP hang on third node
Subscribe to:
Post Comments (Atom)
0 Comments:
Post a Comment