One node of a Rac server get some trouble, its CRSD keep terminating and rebooting. When try to query CRSD will get error “Cannot communicate with the CRS daemon”.
Below error repeate in alertnode1.log:
2012-08-28 19:32:32.995 [ohasd(26038)]CRS-2765:Resource 'ora.crsd' has failed on server 'racnode001'. 2012-08-28 19:32:38.319 [crsd(13117)]CRS-1012:The OCR service started on node racnode001. 2012-08-28 19:42:54.134 [crsd(13117)]CRS-0810:Cluster Ready Service aborted due to failure to communicate
with Event Management Service with error [1].
Details at (:CRSD00120:) in /prod/grid/11.2.0.2/grid/log/racnode001/crsd/crsd.log.
2012-08-28 19:42:54.804
[ohasd(26038)]CRS-2765:Resource 'ora.crsd' has failed on server 'racnode001'.
2012-08-28 19:42:59.175
[crsd(10695)]CRS-1012:The OCR service started on node cihcispdb001.
2012-08-28 19:53:15.226
[crsd(10695)]CRS-0810:Cluster Ready Service aborted due to failure to communicate
with Event Management Service with error [1].
Details at (:CRSD00120:) in /prod/grid/11.2.0.2/grid/log/racnode001/crsd/crsd.log. 2012-08-28 19:53:15.575 [ohasd(26038)]CRS-2765:Resource 'ora.crsd' has failed on server 'racnode001'.
From above log we can see CRSD keep failing and rebooting.
Let‘s check crsd.log to see detail:
2012-08-29 18:07:25.462: [ default][3503137264] CRS Daemon Starting
2012-08-29 18:07:25.464: [ default][1106962752] Policy Engine is not initialized yet!
2012-08-29 18:07:25.465: [ default][3503137264] ENV Logging level for Module: AGENT 1
2012-08-29 18:07:25.465: [ default][3503137264] ENV Logging level for Module: AGFW 0
.............
.............
2012-08-29 18:15:40.648: [ COMMCRS][3503137264]Authorization failed, network error
2012-08-29 18:15:55.584: [ CRSMAIN][1106962752] Policy Engine is not initialized yet!
2012-08-29 18:16:11.408: [ COMMCRS][3503137264]Authorization failed, network error
2012-08-29 18:16:25.586: [ CRSMAIN][1106962752] Policy Engine is not initialized yet!
2012-08-29 18:16:41.668: [ COMMCRS][3503137264]Authorization failed, network error
2012-08-29 18:16:55.597: [ CRSMAIN][1106962752] Policy Engine is not initialized yet!
2012-08-29 18:17:12.929: [ COMMCRS][3503137264]Authorization failed, network error
2012-08-29 18:17:25.609: [ CRSMAIN][1106962752] Policy Engine is not initialized yet!
2012-08-29 18:17:43.190: [ COMMCRS][3503137264]Authorization failed, network error
2012-08-29 18:17:43.191: [ CRSMAIN][3503137264] Created alert : (:CRSD00120:) : Could not init EVM, error: 1
2012-08-29 18:17:43.191: [ CRSD][3503137264][PANIC] CRSD exiting: Could not init EVMMgr.1
2012-08-29 18:17:43.191: [ CRSD][3503137264] Done.
2012-08-29 18:17:44.287: [ default][3316335088] First attempt: init CSS context succeeded.
2012-08-29 18:17:44.290: [ clsdmt][1078745408]PID for the Process [29342], connkey 1
2012-08-29 18:17:44.291: [ clsdmt][1078745408]Creating PID [29342] file for home /prod/grid/11.2.0.2/grid
host racnode001 bin crs to /prod/grid/11.2.0.2/grid/crs/init/ 2012-08-29 18:17:44.291: [ clsdmt][1078745408]Writing PID [29342] to the file
[/prod/grid/11.2.0.2/grid/crs/init/racnode001.pid] 2012-08-29 18:17:45.182: [ default][1078745408] Policy Engine is not initialized yet!
We can see When CRSD starting up, it failed at initial EVENT Manager step. Let's check what's wrong with EVENT MANAGER.
Below is from evmd.log:
2012-08-29 16:12:18.636: [ COMMCRS][54275088]clsc_send: waiting for space to write
on icon (0x9d6fdc0). 0/88 sent 2012-08-29 16:12:26.660: [ COMMCRS][54275088]clsc_send: waiting for space to write
on icon (0x9d6fdc0). 0/88 sent 2012-08-29 16:12:34.676: [ COMMCRS][54275088]clsc_send: waiting for space to write
on icon (0x9d6fdc0). 0/88 sent 2012-08-29 16:12:42.711: [ COMMCRS][54275088]clsc_send: waiting for space to write
on icon (0x9d6fdc0). 0/88 sent
EVMD keep reporting above error message. It means when try to write icon on “/prod/grid/11.2.0.2/grid/auth/evm”, it failed due to lack of space.
But the mount point have enough space, so the possible reason evmd still keep throwing above error could be either:
1. bug
2. in some moment in history, the mount point was lack of space and EVMD was trapped in that state, and even though space released later, EVMD not automatically recovered.
The solution is simple, i reload the EVMD, and CRSD also become fine automaticlly:
2012-08-29 20:42:44.276
[crsd(6075)]CRS-1012:The OCR service started on node racnode001.
2012-08-29 20:52:58.684
[crsd(6075)]CRS-0810:Cluster Ready Service aborted due to failure to communicate with Event Management Service with error [1]. Details at (:CRSD00120:) in /prod/grid/11.2.0.2/grid/log/racnode001/crsd/crsd.log.
2012-08-29 20:52:58.850
[ohasd(26038)]CRS-2765:Resource 'ora.crsd' has failed on server 'racnode001'.
2012-08-29 20:53:03.131
[crsd(4699)]CRS-1012:The OCR service started on node racnode001.
2012-08-29 21:03:17.894
[crsd(4699)]CRS-0810:Cluster Ready Service aborted due to failure to communicate with Event Management Service with error [1]. Details at (:CRSD00120:) in /prod/grid/11.2.0.2/grid/log/racnode001/crsd/crsd.log.
2012-08-29 21:03:18.609
[ohasd(26038)]CRS-2765:Resource 'ora.crsd' has failed on server 'racnode001'.
2012-08-29 21:03:22.962
[crsd(28965)]CRS-1012:The OCR service started on node racnode001.
2012-08-29 21:11:48.965
[evmd(21764)]CRS-1401:EVMD started on node racnode001.
2012-08-29 21:11:52.774
[crsd(28965)]CRS-1201:CRSD started on node racnode001.
Then CRSD become normal:
racnode001 [+ASM1] /prod/grid/11.2.0.2/grid/log/racnode001/evmd > crsctl check crs CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online
4 Comments:
The solution is simple, i reload the EVMD, and CRSD also become fine automaticlly:
>>Question:how to reload the EVMD ?
worst howto EVER
Don't really know what you did there, how did you start evmd? what's the point of this post>
faced same issue, using following commands, able to resolve the issue,
crsctl stop res ora.evmd -init
crsctl start res ora.evmd -init
Post a Comment