September 22, 2012

cluster on first node failed to startup after reboot due to GPNP hang on third node

3 nodes Rac, in a PSU activity, the cluster on first node failed to startup after patching.

Below is the error in alertnode1.log:

2012-09-22 03:01:39.894
[ohasd(28563)]CRS-2112:The OLR service started on node cintrnpdb001.
2012-09-22 03:01:39.917
[ohasd(28563)]CRS-1301:Oracle High Availability Service started on node cintrnpdb001.
2012-09-22 03:01:39.918
[ohasd(28563)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred
2012-09-22 03:01:53.765
[gpnpd(28771)]CRS-2328:GPNPD started on node cintrnpdb001.
2012-09-22 03:01:56.228
[cssd(28831)]CRS-1713:CSSD daemon is started in clustered mode
2012-09-22 03:11:55.298
[/prod/product/oracle/11.2.0.2/grid/bin/cssdagent(28819)]CRS-5818:Aborted command 'start for resource: ora.cssd 1 1' for resource 'ora.cssd'. Details at (:CRSAGF00113:) {0:0:2} in /prod/product/oracle/11.2.0.2/grid/log/cintrnpdb001/agent/ohasd/oracssdagent_root/oracssdagent_root.log.
2012-09-22 03:11:55.299
[cssd(28831)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /prod/product/oracle/11.2.0.2/grid/log/cintrnpdb001/cssd/ocssd.log
2012-09-22 03:11:55.299
[cssd(28831)]CRS-1603:CSSD on node cintrnpdb001 shutdown by user.


During the whole time cssd.log keep throwing below error:
2012-09-22 03:00:00.546: [    GPNP][2692574496]clsgpnpm_newWiredMsg: [at clsgpnpm.c:741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http://www.grid-pnp.org/2005/12/gpnp-errors#"]
2012-09-22 03:00:02.558: [    GPNP][2692574496]clsgpnpm_newWiredMsg: [at clsgpnpm.c:741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http://www.grid-pnp.org/2005/12/gpnp-errors#"]
2012-09-22 03:00:04.568: [    GPNP][2692574496]clsgpnpm_newWiredMsg: [at clsgpnpm.c:741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http://www.grid-pnp.org/2005/12/gpnp-errors#"]

Above information show that cssd failed to get GPNPD profile.

Let's go to check gpnpd to see what's wrong:
2012-09-22 02:46:51.594: [    GPNP][1088584000]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2104] put-profile call to url "tcp://cintrnpdb003:33612" disco "mdns:service:gpnp._tcp.local.://cintrnpdb003:33612/agent=gpnpd,cname=crs_transp,host=cintrnpdb003,pid=20890/gpnpd h:cintrnpdb003 c:crs_transp" [f=0 claimed- host:cintrnpdb001 cname:crs_transp seq:6 auth:CN=GPnP_peer]
2012-09-22 02:47:05.424: [  OCRMSG][1085512000]GIPC error [29] msg [gipcretConnectionRefused]
2012-09-22 02:47:26.446: [  OCRMSG][1085512000]GIPC error [29] msg [gipcretConnectionRefused]

From above we can see when first node's gpnpd startup, it send a call to third node's gpnpd, but get no response.

So till now we can suspect that the issue is because gpnpd on third node have some issue.

After i kill gpnpd.bin on third node and let it re-start, the cluster on first node succeed startup immediately.

0 Comments:

Post a Comment