September 5, 2012

RAC: "CRS-0184: Cannot communicate with the CRS daemon“ due to EVMD

One node of a Rac server get some trouble, its CRSD keep terminating and rebooting. When try to query CRSD will get error “Cannot communicate with the CRS daemon”.

Below error repeate in alertnode1.log:

2012-08-28 19:32:32.995
[ohasd(26038)]CRS-2765:Resource 'ora.crsd' has failed on server 'racnode001'.
2012-08-28 19:32:38.319
[crsd(13117)]CRS-1012:The OCR service started on node racnode001.
2012-08-28 19:42:54.134
[crsd(13117)]CRS-0810:Cluster Ready Service aborted due to failure to communicate 
with Event Management Service with error [1]. 
Details at (:CRSD00120:) in /prod/grid/11.2.0.2/grid/log/racnode001/crsd/crsd.log.
2012-08-28 19:42:54.804
[ohasd(26038)]CRS-2765:Resource 'ora.crsd' has failed on server 'racnode001'.
2012-08-28 19:42:59.175
[crsd(10695)]CRS-1012:The OCR service started on node cihcispdb001.
2012-08-28 19:53:15.226
[crsd(10695)]CRS-0810:Cluster Ready Service aborted due to failure to communicate 
with Event Management Service with error [1]. 
Details at (:CRSD00120:) in /prod/grid/11.2.0.2/grid/log/racnode001/crsd/crsd.log.
2012-08-28 19:53:15.575
[ohasd(26038)]CRS-2765:Resource 'ora.crsd' has failed on server 'racnode001'.

From above log we can see CRSD keep failing and rebooting.

Let‘s check crsd.log to see detail:

2012-08-29 18:07:25.462: [ default][3503137264] CRS Daemon Starting
2012-08-29 18:07:25.464: [ default][1106962752] Policy Engine is not initialized yet!
2012-08-29 18:07:25.465: [ default][3503137264] ENV Logging level for Module: AGENT  1
2012-08-29 18:07:25.465: [ default][3503137264] ENV Logging level for Module: AGFW  0
.............
.............
2012-08-29 18:15:40.648: [ COMMCRS][3503137264]Authorization failed, network error
2012-08-29 18:15:55.584: [ CRSMAIN][1106962752] Policy Engine is not initialized yet!
2012-08-29 18:16:11.408: [ COMMCRS][3503137264]Authorization failed, network error
2012-08-29 18:16:25.586: [ CRSMAIN][1106962752] Policy Engine is not initialized yet!
2012-08-29 18:16:41.668: [ COMMCRS][3503137264]Authorization failed, network error
2012-08-29 18:16:55.597: [ CRSMAIN][1106962752] Policy Engine is not initialized yet!
2012-08-29 18:17:12.929: [ COMMCRS][3503137264]Authorization failed, network error
2012-08-29 18:17:25.609: [ CRSMAIN][1106962752] Policy Engine is not initialized yet!
2012-08-29 18:17:43.190: [ COMMCRS][3503137264]Authorization failed, network error
2012-08-29 18:17:43.191: [ CRSMAIN][3503137264] Created alert : (:CRSD00120:) :  Could not init EVM, error: 1
2012-08-29 18:17:43.191: [    CRSD][3503137264][PANIC] CRSD exiting: Could not init EVMMgr.1
2012-08-29 18:17:43.191: [    CRSD][3503137264] Done.
2012-08-29 18:17:44.287: [ default][3316335088] First attempt: init CSS context succeeded.
2012-08-29 18:17:44.290: [  clsdmt][1078745408]PID for the Process [29342], connkey 1
2012-08-29 18:17:44.291: [  clsdmt][1078745408]Creating PID [29342] file for home /prod/grid/11.2.0.2/grid 
host racnode001 bin crs to /prod/grid/11.2.0.2/grid/crs/init/
2012-08-29 18:17:44.291: [  clsdmt][1078745408]Writing PID [29342] to the file 
[/prod/grid/11.2.0.2/grid/crs/init/racnode001.pid]
2012-08-29 18:17:45.182: [ default][1078745408] Policy Engine is not initialized yet!

We can see When CRSD starting up, it failed at initial EVENT Manager step. Let's check what's wrong with EVENT MANAGER.
Below is from evmd.log:

2012-08-29 16:12:18.636: [ COMMCRS][54275088]clsc_send: waiting for space to write 
on icon (0x9d6fdc0). 0/88 sent
2012-08-29 16:12:26.660: [ COMMCRS][54275088]clsc_send: waiting for space to write 
on icon (0x9d6fdc0). 0/88 sent
2012-08-29 16:12:34.676: [ COMMCRS][54275088]clsc_send: waiting for space to write 
on icon (0x9d6fdc0). 0/88 sent
2012-08-29 16:12:42.711: [ COMMCRS][54275088]clsc_send: waiting for space to write 
on icon (0x9d6fdc0). 0/88 sent

EVMD keep reporting above error message. It means when try to write icon on “/prod/grid/11.2.0.2/grid/auth/evm”, it failed due to lack of space.

But the mount point have enough space, so the possible reason evmd still keep throwing above error could be either:
1. bug
2. in some moment in history, the mount point was lack of space and EVMD was trapped in that state, and even though space released later, EVMD not automatically recovered.

The solution is simple, i reload the EVMD, and CRSD also become fine automaticlly:

2012-08-29 20:42:44.276
[crsd(6075)]CRS-1012:The OCR service started on node racnode001.
2012-08-29 20:52:58.684
[crsd(6075)]CRS-0810:Cluster Ready Service aborted due to failure to communicate with Event Management Service with error [1]. Details at (:CRSD00120:) in /prod/grid/11.2.0.2/grid/log/racnode001/crsd/crsd.log.
2012-08-29 20:52:58.850
[ohasd(26038)]CRS-2765:Resource 'ora.crsd' has failed on server 'racnode001'.
2012-08-29 20:53:03.131
[crsd(4699)]CRS-1012:The OCR service started on node racnode001.
2012-08-29 21:03:17.894
[crsd(4699)]CRS-0810:Cluster Ready Service aborted due to failure to communicate with Event Management Service with error [1]. Details at (:CRSD00120:) in /prod/grid/11.2.0.2/grid/log/racnode001/crsd/crsd.log.
2012-08-29 21:03:18.609
[ohasd(26038)]CRS-2765:Resource 'ora.crsd' has failed on server 'racnode001'.
2012-08-29 21:03:22.962
[crsd(28965)]CRS-1012:The OCR service started on node racnode001.
2012-08-29 21:11:48.965
[evmd(21764)]CRS-1401:EVMD started on node racnode001. 
2012-08-29 21:11:52.774
[crsd(28965)]CRS-1201:CRSD started on node racnode001.

Then CRSD become normal:
racnode001 [+ASM1] /prod/grid/11.2.0.2/grid/log/racnode001/evmd > crsctl check crs
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

4 Comments:

Anonymous said...

The solution is simple, i reload the EVMD, and CRSD also become fine automaticlly:
>>Question:how to reload the EVMD ?

Anonymous said...

worst howto EVER

Anonymous said...

Don't really know what you did there, how did you start evmd? what's the point of this post>

Sunil Bv said...

faced same issue, using following commands, able to resolve the issue,

crsctl stop res ora.evmd -init
crsctl start res ora.evmd -init


Post a Comment