Kane's Blog

April 30, 2012

Trouble Shooting: CRS-4534: Cannot communicate with Event Manager

There is a three nodes Rac cluster.
All DBs are up and running fine on all three nodes.
oracle $ ps -ef|grep smon|wc -l
43
oracle $ ps -ef|grep tns|wc -l
50
oracle $ ps -ef|grep d.bin
root      8177     1 0 Apr28 ?        00:18:29 /stage/grid/11.2.0/grid/bin/orarootagent.bin
oracle    8550     1 22 Apr28 ?        08:52:59 /stage/grid/11.2.0/grid/bin/oraagent.bin
root     26480     1 0 Apr28 ?        00:03:26 /stage/grid/11.2.0/grid/bin/ohasd.bin reboot
oracle   29210     1 0 Apr28 ?        00:00:00 /stage/grid/11.2.0/grid/bin/mdnsd.bin
oracle   29232     1 0 Apr28 ?        00:01:09 /stage/grid/11.2.0/grid/bin/gpnpd.bin
oracle   29257     1 0 Apr28 ?        00:05:32 /stage/grid/11.2.0/grid/bin/gipcd.bin
root     29307     1 0 Apr28 ?        00:00:13 /stage/grid/11.2.0/grid/bin/cssdmonitor
root     29334     1 0 Apr28 ?        00:00:10 /stage/grid/11.2.0/grid/bin/cssdagent
oracle   29351     1 8 Apr28 ?        03:25:51 /stage/grid/11.2.0/grid/bin/ocssd.bin
root     29540     1 0 Apr28 ?        00:20:57 /stage/grid/11.2.0/grid/bin/orarootagent.bin
oracle   29556     1 0 Apr28 ?        00:00:18 /stage/grid/11.2.0/grid/bin/diskmon.bin -d -f
root     30160     1 0 Apr28 ?        00:00:33 /stage/grid/11.2.0/grid/bin/octssd.bin reboot
oracle   30350     1 0 Apr28 ?        00:00:04 /stage/grid/11.2.0/grid/bin/evmd.bin
root     31947     1 0 Apr28 ?        00:03:50 /stage/grid/11.2.0/grid/bin/crsd.bin reboot

From above output red part we aslo can see evmd.bin process is up.
But when use crsctl command, it shows that Event Manger not work on node 2 and node 3.
oracle $ crsctl check cluster -all
**************************************************************
node1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
node2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager
**************************************************************
node3:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager
**************************************************************

Checked cluster alert.log, it also indicate EVMD started normal:
2012-04-28 18:29:48.154
[evmd(30579)]CRS-1401:EVMD started on node node2.

Below is from evmd.log:
2012-04-28 18:29:17.518: [ EVMD][1928129552] EVMD Starting
2012-04-28 18:29:17.518: [ EVMD][1928129552] Initializing OCR
2012-04-28 18:29:17.518: [ EVMD][1928129552] EVMD running in cluster mode
2012-04-28 18:29:18.524: [ EVMD][1928129552] Waiting for OCR. Seconds elapsed: 0
2012-04-28 18:29:48.136: [ EVMD][1928129552] Get OCR context succeeded
2012-04-28 18:29:48.153: [ EVMD][1928129552] Initializing Diagnostics Settings
2012-04-28 18:29:48.224: [ EVMD][1928129552] Authorization database built successfully.
2012-04-28 18:29:49.105: [ EVMAPP][1928129552] EVMD Started
2012-04-28 18:29:49.123: [ EVMD][1928129552] Authorization database built successfully.

All seems fine. Then why "CRS-4534: Cannot communicate with Event Manager"?

Let's check the current status for evmd:
oracle $ crsctl stat res -t -init
-------------------------------------------------------
NAME             TARGET          STATE       SERVER
-------------------------------------------------------
ora.evmd      1 ONLINE   INTERMEDIATE node3

The current status INTERMEDIATE means the evmd did starting but not totally completed.

Below recorded in oracle_agent.logfile:
2012-04-28 18:45:50.936: [    AGFW][1113385280] {0:0:2} ora.evmd 1 1 state changed from: UNKNOWN to: OFFLINE
.................
2012-04-28 18:46:23.441: [    AGFW][1114609984] {0:0:2} ora.evmd 1 1 state changed from: UNKNOWN to: STARTING
.................
2012-04-28 20:41:27.768: [    AGFW][1116698944] {0:0:2} ora.evmd 1 1 state changed from: STARTING to: PARTIAL
.................

We can see the evmd was starting in progress, status changed from UNKNOWN --> OFFLINE --> STARTING --> PARTIAL.
And next the status should change from PARTIAL to ONLINE.
But from the logfile, we don't see the final step, it never happen.

It failed at the final step due to below error message:
2012-04-28 20:43:27.787: [ora.evmd][1112496448] {0:0:2} [check] clsn_agent::abort {
2012-04-28 20:43:27.787: [ora.evmd][1112496448] {0:0:2} [check] abort {
2012-04-28 20:43:27.787: [ora.evmd][1112496448] {0:0:2} [check] abort command: check
2012-04-28 20:43:27.787: [ora.evmd][1112496448] {0:0:2} [check] tryActionLock {
2012-04-28 20:44:27.816: [ora.evmd][1112496448] {0:0:2} [check] did not get lock
2012-04-28 20:44:27.816: [ora.evmd][1112496448] {0:0:2} [check] tryActionLock }
2012-04-28 20:44:27.816: [ora.evmd][1112496448] {0:0:2} [check] clsn_agent::abort: Exception LockAcquireTimeoutException
2012-04-28 20:44:27.816: [ora.evmd][1112496448] {0:0:2} [check] clsn_agent::abort, agent exiting }

This is very similar to bug 11807012: oraagent.bin Exits After Check Timed Out

But this bug should have been fixed in 11.2.0.2.3, and our server already patched 11gR2 Patch Set Update 4.

It looks like that PSU don't real fully solved that bug.

Raised SR and will check with Oracle Support further.

More......

April 24, 2012

rowid paging benefit from 11g new feature

Consider below request:
We need to get the 900 to 1000 records after ordered by time.

We call this kind of request as "paging". Because in most case we encounter this kind of request in web page's redirect buttons: page 1, page 2, page 3.....
We can jump to any page via clicking the conrrespoding buttion.

There are two kinds of paging method in oracle: Rownum paging and Rowid paging.
Almost all the documents which i read all announce that rowid is a better method then rownum for paging.
They said:
For the first a few pages, two methods have similiar performance.But as long as the page number become bigger and bigger, rowid paging perform better and better then rownum paging.

Pay attention to the red part. Is that really true? Let's make a experiment.

First test in 11g. Create test table as below:
11gR2 >create table test(id number,status VARCHAR2(7),type VARCHAR2(19),created date);
Table created.
11gR2 >insert into test select OBJECT_ID,STATUS,OBJECT_TYPE,CREATED from dba_objects;
12926 rows created.
11gR2 >alter table test modify created not null;
Table altered.
11gR2 >create index test_ind1 on test(CREATED);
Index created.
11gR2 >ANALYZE TABLE TEST compute statistics;
Table analyzed.

We are going to read row 900-1000 from table test order by created.

Test 11g paging with rownum method：
11gR2 >select *
       from (
             select rownum rn,t.*
             from
             (select id,status,type,created from test order by created) t
             where rownum<1000)
       where rn >900;
99 rows selected.
--------------------------------------------------------------------------------------------
| Id | Operation                      | Name      | Rows | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |           |   999 | 50949 |     9   (0)| 00:00:01 |
|* 1 | VIEW                          |           |   999 | 50949 |     9   (0)| 00:00:01 |
|* 2 |   COUNT STOPKEY                |           |       |       |            |          |
|   3 |    VIEW                        |           |   999 | 37962 |     9   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID| TEST      | 12926 |   277K|     9   (0)| 00:00:01 |
|   5 |      INDEX FULL SCAN           | TEST_IND1 |   999 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
         22 consistent gets
          0 physical reads
          0 redo size
       3703 bytes sent via SQL*Net to client
        590 bytes received via SQL*Net from client
          8 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         99 rows processed

Test 11g paging with rowid method：
11gR2 >select /*+ ordered use_nl(p s) */ *
      from (
            select rownum rn,rd
            from (select rowid rd from test order by created)
            t where rownum<1000) p,
            test s
      where rn>900 and p.rd=s.rowid; 2    3    4    5    6    7
99 rows selected.
-----------------------------------------------------------------------------------------
| Id | Operation                   | Name      | Rows | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |   999 | 52947 | 1003   (0)| 00:00:13 |
|   1 | NESTED LOOPS               |           |   999 | 52947 | 1003   (0)| 00:00:13 |
|* 2 |   VIEW                      |           |   999 | 24975 |     4   (0)| 00:00:01 |
|* 3 |    COUNT STOPKEY            |           |       |       |            |          |
|   4 |     VIEW                    |           |   999 | 11988 |     4   (0)| 00:00:01 |
|   5 |      INDEX FULL SCAN        | TEST_IND1 | 12926 |   239K|     4   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY USER ROWID| TEST      |     1 |    28 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
         19 consistent gets
          0 physical reads
          0 redo size
       5450 bytes sent via SQL*Net to client
        590 bytes received via SQL*Net from client
          8 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         99 rows processed

From the red part we can see rowid paging(19 logic read) do better then rownum paging(22 logic read).
The reason is rowid method reduce the "TABLE ACCESS BY ROWID" for first 900 rows.

Let's check in 10g:
Build the Test table as excatly the same with 11g, created the index and gathered stats.
Test 11g paging with rownum method：
10gR2 >select *
       from (
             select rownum rn,t.*
             from
             (select id,status,type,created from test order by created) t
             where rownum<1000)
       where rn >900;
99 rows selected.
--------------------------------------------------------------------------------------------
| Id | Operation                      | Name      | Rows | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |           |   999 | 50949 |    17   (0)| 00:00:01 |
|* 1 | VIEW                          |           |   999 | 50949 |    17   (0)| 00:00:01 |
|* 2 |   COUNT STOPKEY                |           |       |       |            |          |
|   3 |    VIEW                        |           |   999 | 37962 |    17   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID| TEST      | 45620 | 1113K|    17   (0)| 00:00:01 |
|   5 |      INDEX FULL SCAN           | TEST_IND1 |   999 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
         19 consistent gets
          0 physical reads
          0 redo size
       3842 bytes sent via SQL*Net to client
        558 bytes received via SQL*Net from client
          8 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         99 rows processed

19 logic read, very close to 11g. It is a normal value and close to our expected.

Test 11g paging with rowid method：
10gR2 >select /*+ ordered use_nl(p s) */ *
      from (
            select rownum rn,rd
            from (select rowid rd from test order by created)
            t where rownum<200) p,
            test s
      where rn>100 and p.rd=s.rowid;
99 rows selected.
-----------------------------------------------------------------------------------------
| Id | Operation                   | Name      | Rows | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |   199 | 11144 |   261   (1)| 00:00:04 |
|   1 | NESTED LOOPS               |           |   199 | 11144 |   261   (1)| 00:00:04 |
|* 2 |   VIEW                      |           |   199 | 4975 |    62   (2)| 00:00:01 |
|* 3 |    COUNT STOPKEY            |           |       |       |            |          |
|   4 |     VIEW                    |           | 45620 |   534K|    62   (2)| 00:00:01 |
|   5 |      INDEX FULL SCAN        | TEST_IND1 | 45620 |   846K|    62   (2)| 00:00:01 |
|   6 |   TABLE ACCESS BY USER ROWID| TEST      |     1 |    31 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
        108 consistent gets
          0 physical reads
          0 redo size
       5252 bytes sent via SQL*Net to client
        558 bytes received via SQL*Net from client
          8 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         99 rows processed

wow, why logic turn to 108? The execution plan is same with 11g, but why so big difference in logic read?
      rownum    rowid
11g:      22       19
10g:      19      108

Why?
Below is the Reason:
11gR2 >select rowid,id from test where rowid in ('AAADSHAABAAAH3hAAA','AAADSHAABAAAH3hAAB','AAADSHAABAAAH3hAAC','AAADSHAABAAAH3hAAD','AAADSHAABAAAH3hAAE','AAADSHAABAAAH3hAAF','AAADSHAABAAAH3hAAG','AAADSHAABAAAH3hAAH','AAADSHAABAAAH3hAAI','AAADSHAABAAAH3hAAJ');
ROWID                      ID
------------------ ----------
AAADSHAABAAAH3hAAA         20
AAADSHAABAAAH3hAAB         46
AAADSHAABAAAH3hAAC         28
AAADSHAABAAAH3hAAD         15
AAADSHAABAAAH3hAAE         29
AAADSHAABAAAH3hAAF          3
AAADSHAABAAAH3hAAG         25
AAADSHAABAAAH3hAAH         41
AAADSHAABAAAH3hAAI         54
AAADSHAABAAAH3hAAJ         40
10 rows selected.
Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          2 consistent gets
          0 physical reads
          0 redo size
        875 bytes sent via SQL*Net to client
        524 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         10 rows processed

10gR2 >select rowid,id from test where rowid in ('AAAzo5AABAAAKnWAAA','AAAzo5AABAAAKnWAAB','AAAzo5AABAAAKnWAAC','AAAzo5AABAAAKnWAAD','AAAzo5AABAAAKnWAAE','AAAzo5AABAAAKnWAAF','AAAzo5AABAAAKnWAAG','AAAzo5AABAAAKnWAAH','AAAzo5AABAAAKnWAAI','AAAzo5AABAAAKnWAAJ');
ROWID                      ID
------------------ ----------
AAAzo5AABAAAKnWAAA         30
AAAzo5AABAAAKnWAAB          8
AAAzo5AABAAAKnWAAC         14
AAAzo5AABAAAKnWAAD         34
AAAzo5AABAAAKnWAAE         45
AAAzo5AABAAAKnWAAF         39
AAAzo5AABAAAKnWAAG         47
AAAzo5AABAAAKnWAAH         51
AAAzo5AABAAAKnWAAI         11
AAAzo5AABAAAKnWAAJ         48
10 rows selected.
Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
         10 consistent gets
          0 physical reads
          0 redo size
        861 bytes sent via SQL*Net to client
        492 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         10 rows processed

A Rowid is made by: OOOOOOFFFBBBBBBRRR
O is object id，F is file id，B is block id，R is row number.
From above we can see, read 10 rows from same block in 11g cost 2 block read while in 10g cost 10 block read.

Further research also confirmed in 10g, every rowid will gernerate an logic read, even all the rowid are come from same block.
But in 11g some new machanism are introduced, if next rowid is in the same block with current row, then current block can be reuse.
More......

April 9, 2012

MV refresh cost long time due to internal recursive call

We have a job defined to do mutiple MV refresh. Most MVs can refresh normally, but there are two MVs cost very long time to complete.
No locks/latch contention found during the refresh.

Below is the log:
START_TIME          OBJECT_NAME                    TIME_COST(s)
------------------- ------------------------------ -------------
2012-01-06 00:45:44 MV_APP_CURR_ST_1_HR_YLD_AVAIL 17.65
2012-01-06 00:46:02 MV_APP_CURR_STATE_YIELD_AVAIL 2.7
2012-01-06 00:46:05 MV_APP_CURRENT_STATE          220.34
2012-01-06 00:49:45 MV_APP_CURR_POLE_1_HR_Y_AVAIL 22.93
2012-01-06 00:50:08 MV_APP_CURR_STATE_POLE_Y_AVAIL 3.62
2012-01-06 00:50:12 MV_APP_CURRENT_STATE_POLE      125.59
From above we can see MV_APP_CURRENT_STATE and MV_APP_CURRENT_STATE_POLE cost very long time.
When check these MVs, found their definition are simply. I tried to run MVs' SELECT part, both can return all record in 1 second.

But they will take 100+ seconds when refresh the MV with below sentence:
dbms_mview.refresh('"GAPC"."MV_APP_CURRENT_STATE_POLE"', atomic_refresh ='FALSE');

After make an 10046 trace, found during the MV refresh, below internal recursive SQL cost most of time:
********************************************************************************
SQL ID: 6v8ygt9zbxsnh Plan Hash: 2840274064
select count(*), max(scn)
from
( select d.scn scn from sys.sumdelta$ d, sys.sumdep$ o
where o.sumobj# = :1 and o.p_obj# = d.tableobj# and d.spare1 =
0 and d.scn > :2 union all select m.commit_scn scn
from sys.sumdelta$ d, sys.sumdep$ o, sys.snap_xcmt$ m where
o.sumobj# = :1 and o.p_obj# = d.tableobj# and d.spare1 = 1 and
d.xid = m.xid and m.commit_scn > :2 )
call   count cpu      elapsed    disk       query      current    rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse   7      0.00     0.00       0          1          0          0
Execute 7      0.00     0.00       0          0          0          0
Fetch   7      31.25    367.18     99567      127778     0          7
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total   21     31.26    367.19     99567      127779     0          7

Notice red parts, above internel recursive SQL cost 367 seconds while total MV refresh time is 370 seconds.

After checking above SQL, find its slowness mainly caused by table sumdelta$ which is 1.5GB big:
SQL> select TABLEOBJ#,count(*) from sys.sumdelta$ group by TABLEOBJ#;
TABLEOBJ#    COUNT(*)
---------- ----------
11784739    1536910
   2857695     449018
   7506186    8808703

This table contains 10 millions records. And most record are last year's.

SQL> select owner,object_id,object_type,object_name,status from dba_objects where object_id in (11784739,2857695,7506186);
OWNER       OBJECT_ID OBJECT_TYPE         OBJECT_NAME                    STATUS
---------- ---------- ------------------- ------------------------------ -------
GAPC         11784739 TABLE               MV_EVENT_METER_ONE_DAY         VALID
GAPC          2857695 TABLE               MV_APP_CURRENT_ALERT_STATE     VALID

These records in sys.sumdelta$ do related to our issue MVs.

Till now the issue is clearly. Due to some bug, expired data not purged normally in sys.sumdelta$.
This table grew bigger and bigger, and make the SQL related to it become rather slow.

After truncate this table sys.sumdelta$.
Those issue MVs finished in a few seconds.
More......

April 6, 2012

buffer busy waits, resmgr:cpu quantum, and Suspending MMON action ..

When checking on one DB, below queries hang:
bdr1 >select s.sid,p.SPID,s.serial# from v$session s, v$process p where s.PADDR=p.ADDR and s.sid= (select sid from v$mystat where rownum<2);
SID SPID  SERIAL#
--- ----- ------
591 10422 25701
bdr1 >SELECT INST_ID,DECODE(request,0,'Holder: ','Waiter: ')||sid sess, id1, id2, lmode, request, type,ctime
FROM gV$LOCK WHERE (id1, id2, type) IN (SELECT id1, id2, type from gV$LOCK WHERE request>0)
ORDER BY id1, request;
---Hang there

After waited 5 minutes but no lucky, i decide to have a check.
First check the wait event:
bdr1 >select event from v$session where sid=591;
EVENT
------------------------------------------
buffer busy waits
bdr1 >select P1,P2,P3 from v$session where sid=591;
P1   P2      P3
---- ------- ----------
81   5925    21

Wait for 10 seconds, then query again:
bdr1 >select P1,P2,P3 from v$session where sid=591;
P1   P2      P3
---- ------- ----------
81   5925    21

Above output means the session is waiting buffer busy waits on file 81 block 5925 for more then ten seconds.
It is so wired, i definitely haven't encounter this before. As we all known, buffer busy waits occurs when another session pining one block during an DML, and it is a very light operation. How can a session wait one block for ten seconds?

Let's check what the block is:
bdr1 >alter system dump datafile 81 block 5925;
System altered.

The trace file shows this block is an undo segment header block:
index state cflags wrap# uel scn dba parent-xid nub stmt_num cmt
-------------------------------------------------------------
0x00 9 0x00 0x30998 0x0013 0x0b4f.57013a1b 0x198009af 0x0000.000.00000000 0x00000001 0x00000000 1332741051
0x01 9 0x00 0x30999 0x002f 0x0b4f.57013c9b 0x198009af 0x0000.000.00000000 0x00000001 0x00000000 1332741600
0x02 9 0x00 0x30999 0x001d 0x0b4f.5701397a 0x198009af 0x0000.000.00000000 0x00000001 0x00000000 1332740819
0x03 9 0x00 0x3099a 0x0021 0x0b4f.57013d32 0x198009b2 0x0000.000.00000000 0x00000001 0x00000000 1332741601
0x04 9 0x00 0x30999 0x0020 0x0b4f.57013d05 0x198009b2 0x0000.000.00000000 0x00000004 0x00000000 1332741600
0x05 10 0x00 0x3099a 0x001d 0x0b4f.57015ee9 0x198009ab 0x0000.000.00000000 0x00000001 0x00000000 0
......
bdr1 >SELECT SEGMENT_TYPE,OWNER||'.'||SEGMENT_NAME FROM DBA_EXTENTS where file_id= 81 AND block_id BETWEEN 5925 AND 5925+BLOCKS -1;
SEGMENT_TYPE       OWNER||'.'||SEGMENT_NAME
------------------ ---------------------------------------------
TYPE2 UNDO         SYS._SYSSMU3_424634979$
TYPE2 UNDO         SYS._SYSSMU9_2057384446$

Tried to flush shared_pool and buffer_cache, but appear useless:
bdr1 >alter system flush shared_pool;
System altered.
bdr1 >select P1,P2,P3 from v$session where sid=591;
P1   P2      P3
---- ------- ----------
81   5925    21

bdr1 >alter system flush buffer_cache;
System altered.
bdr1 >select P1,P2,P3 from v$session where sid=591;
P1   P2      P3
---- ------- ----------
81   5925    21

Let's check which session is holding that undo segment header block:
bdr1 >select BLOCKING_INSTANCE,BLOCKING_SESSION from v$session where sid=591;
BLOCKING_INSTANCE BLOCKING_SESSION
----------------- ----------------
1                 55

The block is holding by SID 55. Let's check the session 55:
SSID SERIAL# LOGON_TIME STATE STATUS EVENT PROGRAM
----- ---------- ----------- -------- ----- ------ --------
55 18167 26-MAR-12 02:00 WAITING ACTIVE resmgr:cpu quantum
oracle@ora03.globaltax.ge.com (M000)

Pay attention to the red part's event: resmgr:cpu quantum. It is related to resouce manager.

This session is MMON slave process.
Watched some time and confirmed this session is hang.

Let's check the alert.log and see weather we can find some clue:
Thu Mar 29 19:00:40 2012
Suspending MMON action 'AWR Auto CPU USAGE Task' for 82800 seconds
Thu Mar 29 19:10:42 2012
Suspending MMON slave action kewfmadsa_ for 82800 seconds
Thu Mar 29 23:05:58 2012
Suspending MMON action 'undo usage' for 82800 seconds

From the alert.log we can see that the MMON suspended, and unluckily at that moment it was pining the undo segment header block, and therefore blocking other sessions.

Suspending MMON is a new feature in 11G. If the system is so over-loaded that it takes over 15 minutes to gather statistics or other MMON tasks, this error is expected. It is a functionality enhancement in 11g, as it prevents MMON from locking resources those other processes might be waiting for. In 10g , mmon slaves are allowed to run indefinitely.

Let's check the OS load:
bdr1 >ho free -m
             total    used    free    shared buffers    cached
Mem:       128874    128107       766       0       302    117527
-/+ buffers/cache:    10277    118596
Swap:       66463    4350    62113

Cpu(s): 12.8%us,  2.4%sy,  0.0%ni, 84.4%id,  0.3%wa,  0.0%hi,  0.1%si,  0.0%st

The load is quite low, then why MMON suspend?
The MMON is charge of AWR, let's check the AWR report:
Enter value for num_days: 3
Listing the last 3 days of Completed Snapshots
                                                        Snap
Instance    DB Name       Snap Id Snap Started Level
------------ ------------ --------- ------------------ -----
bdr1       BDR1          31208 24 Mar 2012 00:00    1
................
................
                           31251 25 Mar 2012 19:00    1
                           31252 25 Mar 2012 20:00    1
                           31253 25 Mar 2012 21:00    1
                           31254 25 Mar 2012 22:00    1
                           31255 25 Mar 2012 23:00    1
                           31256 26 Mar 2012 00:00    1
                           31257 26 Mar 2012 01:00    1

Specify the Begin and End Snapshot Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Enter value for begin_snap:
bdr1 >
bdr1 >ho date
Fri Mar 30 05:41:54 EDT 2012

We can see currently is Mar30th, but the latest AWR report is gernerated at Mar26th. MMON facing the issue since then.
Let's check the alert.log what happend at Mar26th:
Sun Mar 25 22:03:08 2012
DBMS_STATS: GATHER_STATS_JOB encountered errors.  Check the trace file.
ORA-20011: Approximate NDV failed: ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04040: file outdata_to_show.txt in STADM_DIR not found
Mon Mar 26 02:00:00 2012
Closing scheduler window
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter
Setting Resource Manager plan SCHEDULER[0xD9D6EFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Mon Mar 26 02:28:21 2012
Suspending MMON action 'undo usage' for 82800 seconds
Mon Mar 26 03:10:25 2012
Process 0x0xdcbda2a8 appears to be hung in Auto SQL Tuning task
Current time = 1332745824, process death time = 1332745800
Attempting to kill process 0x0xdcbda2a8 with OS pid = 11769
OSD kill succeeded for process 0xdcbda2a8

Till now, we can obviously consider this issue somehow related to resouce manager.
After created an SR, Oracle Support confirmed this is an resource manager's bug, the solution is to disable resource manager:
1. alter system set resource_manager_plan='';
2. For each window_name:
execute dbms_scheduler.set_attribute('WEEKNIGHT_WINDOW','RESOURCE_PLAN','');
More......

March 31, 2012

ORA-1628: max # extents 32765 reached for rollback segment _SYSSMU6_621766836$

One DB suddenly got many blow errors in a period of time:
Wed Mar 28 23:53:58 2012
ORA-1628: max # extents 32765 reached for rollback segment _SYSSMU6_621766836$
Wed Mar 28 23:53:58 2012
ORA-1628: max # extents 32765 reached for rollback segment _SYSSMU7_1070497209$

This error means an undo segment reached its max extents so that it couldn't extend any more：
SQL> select SEGMENT_NAME,MAX_EXTENTS from DBA_ROLLBACK_SEGS;
SEGMENT_NAME                   MAX_EXTENTS
------------------------------ -----------
_SYSSMU59_1810053014$                32765
_SYSSMU58_3572456133$                32765
_SYSSMU57_2783840325$                32765
_SYSSMU56_2339751936$                32765
_SYSSMU55_3550317550$                32765
..........

It is none business with the size of undo tablespace, infact there are plenty of free space：
Tablespace                    Free MB   Total MB        PCT
------------------------- ----------- ---------- ----------UNDORBS                        21,483      24000    89.5125

The DB use Automatic Undo Management (AUM) :

SQL> show parameter undo
NAME TYPE VALUE
------------------------------------
undo_management string AUTO
undo_retention integer 10800
undo_tablespace string UNDORBS

Since it uses AUM, there is no way for us to turn the max_extents attribute:
SQL> select MAX_EXTENTS from DBA_ROLLBACK_SEGS where SEGMENT_NAME='_SYSSMU10$';
MAX_EXTENTS
-----------
32765

SQL> alter rollback segment "_SYSSMU10$" storage (MAXEXTENTs unlimited);
Rollback segment altered.

Above command was executed successful, but if we checked again, we can find the attribute still not changed:

SQL> select MAX_EXTENTS from DBA_ROLLBACK_SEGS where SEGMENT_NAME='_SYSSMU10$';
MAX_EXTENTS
-----------
32765

Even setting "_smu_debug_mode" is useless:
SQL> alter system set "_smu_debug_mode" = 4;
System altered.

SQL> alter rollback segment "_SYSSMU10$" storage (MAXEXTENTs unlimited);
ERROR at line 1:
ORA-25150: ALTERING of extent parameters not permitted

Pay attention to this, when get ORA-1628 with AUM, in most cases, the root cause is not simply the max extents attribute too small. In most cases it is related to some bug.

Let's check our case:
SQL> select to_char(begin_time,'MM/DD/YYYY HH24:MI') begin_time, UNXPSTEALCNT, EXPSTEALCNT , NOSPACEERRCNT, TUNED_UNDORETENTION from gv$undostat;
BEGIN_TIME       UNXPSTEALCNT EXPSTEALCNT NOSPACEERRCNT TUNED_UNDORETENTION
---------------- ------------ ----------- ------------- -------------------
03/30/2012 03:24            0           0             0            23941735
03/30/2012 03:14            0           0             0            23806293
03/30/2012 03:04            0           0             0            23670851
...........
...........
03/29/2012 00:14            0           0             0            42082827
03/29/2012 00:04            0           0             0            39077268
03/28/2012 23:54         1081        1081             5             9930112
03/28/2012 23:44            0           0             9          2367948800
03/28/2012 22:14            0           0             0                   0
03/28/2012 22:04            0           0             0          2346086400

From red part, we can obviously see the auto turned undo retention is not a normal value.

And pay attention the time point with those unusual retention time, the same time the DB got the ORA-1628 error.

And it make sense now. At that very period time, oracle miss judged a very big value for auto tuned undo retention time.
Hence make many commited undo blocks keeping un-expired, and force the rollback segments growing bigger and bigger, and at last leading to:
ORA-1628: max # extents 32765 reached for rollback segment _SYSSMU6_621766836$

OK, since we find out the root cause. The solution is simple, we can set below parameter to control the undo retention time:
SQL> alter system set "_smu_debug_mode" = 33554432;
System altered.

Let's check the undo retention time again:
SQL> select to_char(begin_time,'MM/DD/YYYY HH24:MI') begin_time, UNXPSTEALCNT, EXPSTEALCNT , NOSPACEERRCNT, TUNED_UNDORETENTION from gv$undostat;
BEGIN_TIME       UNXPSTEALCNT EXPSTEALCNT NOSPACEERRCNT TUNED_UNDORETENTION
---------------- ------------ ----------- ------------- -------------------
03/30/2012 03:44            0           0             0               10800
03/30/2012 03:34            0           0             0               10800
03/30/2012 03:24            0           0             0            23941735
03/30/2012 03:14            0           0             0            23806293
03/30/2012 03:04            0           0             0            23670851
...........
...........

More......

March 29, 2012

oracle and UTC

I was trouble shooting an repeatly appreared ORA-1555 issue. The ORA-1555 was caused by a SQL with a small query duration while the auto turned undo retention time is far larger then that.

Till now we are already close to the root of the issue. But that's not the topic i want to say here, maybe in later's post. Here i want to share a trick i found during my trouble shooting. It proved oracle use UTC as its standard.

When checking the undo segment header dump, i found below:
index state cflags wrap# uel scn dba cmt
------------------------------------------------------------------------------
0x00 9 0x00 0x02de 0x0007 0x0b50.2b608af5 0x02401bb0 1333068666
0x01 9 0x00 0x02de 0x0002 0x0b50.2b66f219 0x02401bb3 1333070396
0x02 9 0x00 0x02de 0x0042 0x0b50.2b66f2a3 0x02401bb3 1333070396
0x03 9 0x00 0x02de 0x000d 0x0b50.2b5ccf81 0x02c05c70 1333067236
0x04 9 0x00 0x02dd 0x0037 0x0b50.2b55621f 0x02c05c4a 1333065389
0x05 9 0x00 0x02de 0x001b 0x0b50.2b5cf63e 0x02c05c71 1333067756
0x06 9 0x00 0x02de 0x0000 0x0b50.2b606d49 0x00000000 1333068599
0x07 9 0x00 0x02de 0x0017 0x0b50.2b61aa83 0x02401bb2 1333068893
0x08 9 0x00 0x02de 0x0027 0x0b50.2b5d14ba 0x02c05cbf 1333067788
0x09 9 0x03 0x02de 0x0015 0x0b50.2b6b3d38 0x02401c81 1333071273
0x0a 9 0x00 0x02de 0x0012 0x0b50.2b6b495d 0x02401c81 1333071316
0x0b 9 0x00 0x02de 0x0001 0x0b50.2b66ef7e 0x02401bb2 1333070395
0x0c 9 0x00 0x02dd 0x0025 0x0b50.2b5ce170 0x02c05c71 1333067657
........
........

9 in state column indicates all transaction here are not activity now, and fclags indiates all transcations are committed normal(no dead transaction).

All seems good till i notice the cmt column. cmt means commit time. And it is not record as an SCN format which we usually see in oracle. It is stored as an unix time format.

Let‘s take red part 1333071317 as example. I tried to conver it into oracle time:
SQL> select sysdate,TO_DATE('19700101000000','YYYYMMDDHH24MISS')+NUMTODSINTERVAL('1333071316', 'SECOND') cmt from dual;
SYSDATE CMT
------------------- -------------------
2012-03-29-22.23.32 2012-03-30-01.35.16

What? The transcation commit time was beyond current time? I saw the future?

It did confuse me for a while till i suddenly figure out that the commit time should use the UAT, while our lcoal machine is using EDT.

Check below:
cihcispdb774.corporate.ge.com[oracle]_gldr12> date
Thu Mar 29 22:24:21 EDT 2012
cihcispdb774.corporate.ge.com[oracle]_gldr12> date -u
Fri Mar 30 02:24:27 UTC 2012

And:
cihcispdb774.corporate.ge.com[oracle]_gldr12> date +%s
1333074371

Now when compared to UTC they are all matched.

What is UTC?
According to the US Navel Observatory, UTC time is "Coordinated Universal Time," a coordinated civil time scale "kept by time laboratories around the world ... determined using highly precise atomic clocks." This is approximately the same as Greenwich Mean Time (GMT), but is more precise. A number of nations, such as Morocco, Iceland, and Mali; use UTC directly as their civil time. In the winter, UTC is the civil time for the UK. US Civil Time is determined by a set number of hours offset from UTC. For instance, Arizona (which doesn't keep Daylight Savings Time) is always UTC+7. Beijing, China's timezone is UTC-8
More......

March 18, 2012

Understand the DBV output

Below is a DBV output of a totally corrupted block:
Page 104503 is marked corrupt
Corrupt block relative dba: 0x04019837 (file 16, block 104503)
Bad header found during dbv:
Data in bad block:
type: 201 format: 0 rdba: 0x6bf21791
last change scn: 0x2cd7.f5a66aae seq: 0x7 flg: 0x0e
spare1: 0x63 spare2: 0x81 spare3: 0x8c0e
consistency value in tail: 0x910ff227
check value in block header: 0x4fb6
computed block checksum: 0xe10f

1.type: 201
No this type of block. Mostly we see 6 here, which means : Table/cluster/index segment data block.

2.format: 0
No this format of block, below are some possible format.
8i~9i: all : 0x02
10g : 2k : 0x62
            4k : 0x82
            8k : 0xa2
          16k : 0xc2
logfile   all : 0x22

3.rdba: 0x6bf21791
rdba(rfile#+block) infor in block header is wrong, it should be the same with below part:
relative dba: 0x04019837 (file 16, block 104503)

4.last change scn: 0x2cd7.f5a66aae seq: 0x7 flg: 0x0e
consistency value in tail: 0x910ff227
facture tail is wrong, should be 0x6aae0e07 (Red part)

5. flg: 0x0e
Wrong. Blow are some posible block flg:
#define KCBHFNEW 0x01 /* new block - zeroed data area */
#define KCBHFDLC 0x02 /* Delayed Logging Change advance SCN/seq */
#define KCBHFCKV 0x04 /* ChecK Value saved-block xor's to zero */
#define KCBHFTMP 0x08 /* Temporary block */
We often see 0x06 in an formated block which means 02+04.

6. spare1: 0x63 spare2: 0x81 spare3: 0x8c0e
Wrong, wrong, wrong, all spare should be 0x0 in any formated block.

7.check value in block header: 0x4fb6
checksum value recorded on block is wrong, should be the same with below part:
computed block checksum: 0xe10f

Gernerally speaking, this data block is totally corruput. It looks like an un-formated block.
Even though there were some data, now are already gone.
More......

March 14, 2012

a wierd connecting Issue, root reachable but oracle not

A server's netbackup failed with below error:
RMAN-03009: failure of backup command on CH1 channel at 03/12/2012 23:16:59
ORA-19506: failed to create sequential file, name="VIGP_TAPECOPY_03122012_1_11668", parms=""
ORA-27028: skgfqcre: sbtbackup returned error
ORA-19511: Error received from media manager layer, error text:
VxBSAValidateFeatureId: Failed with error:
Server Status: cannot connect on socket
ORA-19600: input file is backup piece (/prod/backup/oracle/vigp/oracle/vigp1/arch/VIGP_ARCH_03022012_1_11668)

After checking the reason is netbackup server not reachable:
[oracle@node1 ~]$ ping netback_b
ping: unknown host netback_b

The settng for the /etc/hosts is correct:
[oracle@node1 ~]$ cat /etc/hosts|grep netback_b
2.1.0.10 netback_b netback_b.corporate.ge.com

And the wierd thing coming:
[oracle@node1 ~]$ su - root
[root@node1 ~]# ping netback_b
PING netback_b (2.1.0.10) 56(84) bytes of data.
64 bytes from netback_b (2.1.0.10): icmp_seq=1 ttl=252 time=1.11 ms
64 bytes from netback_b (2.1.0.10): icmp_seq=2 ttl=252 time=0.401 ms

Oracle it is not ping-able, but via root it is ping-able.
Why?

Let's trace the session:
[oracle@node1 ~]$strace -f -o /tmp/strace.log ping netback_b

Below is in the trace file:
..........
4347 open("/etc/nsswitch.conf", O_RDONLY) = -1 EACCES (Permission denied)
4347 open("/opt/CA/SharedComponents/lib/libnss_dns.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
4347 open("/opt/CA/CAlib/libnss_dns.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)

From above red part we find out why.
[oracle@node1 ~]$ ls -ltr /etc/nsswitch.conf
-rw------- 1 root root 1730 Feb 23 09:07 /etc/nsswitch.conf

Oracle don't have access to above file.
After grant permission to oracle, the ping succeed for oracle, so is the netbackup.
More......

March 12, 2012

DB get an outage caused by INODE

A three nodes Rac, the third node, when user connect, get below error:
ORA-02002: error while writing to audit trail
ORA-09925: Unable to create audit trail file
Linux-x86_64 Error: 28: No space left on device

The error seems the audit mount point is full.

Log on the server and check, the audit directory is:
/prod/product/corp_bac1/audit/trail

Let's check the audit directory mount point:
node3[oracle]_corp_bac3> df -h|grep /prod/product
34G 27G 5.1G 84% /prod/product

Wired, the mount point is not full:
node3[oracle]_corp_bac3> df -h|grep /prod/product
34G 27G 5.1G 84% /prod/product

Then why set get the error: "Linux-x86_64 Error: 28: No space left on device"？

Switch to audit directory and run ls -ltr, the screen flash with：
06:54 corp_bac3_ora_20624_2e.aud
06:54 corp_bac3_ora_20621_38.aud
06:54 corp_bac3_ora_20618_32.aud
06:54 corp_bac3_ora_20615_38.aud
06:54 corp_bac3_ora_20612_33.aud
06:54 corp_bac3_ora_20688_3c.aud
06:54 corp_bac3_ora_20685_34.aud
06:54 corp_bac3_ora_20682_3e.aud
06:54 corp_bac3_ora_20679_3e.aud
06:54 corp_bac3_ora_20676_3e.aud
06:54 corp_bac3_ora_20673_34.aud
06:54 corp_bac3_ora_20669_37.aud
06:54 corp_bac3_ora_20666_33.aud
06:54 corp_bac3_ora_20663_3c.aud
06:54 corp_bac3_ora_20660_32.aud
.........
.........
It showed hundreds of audit files generated in one minute.

I canceled the ls -ltr, and want to caculated the amount number of audit file under the directory:
[oracle@node3 trail]$ ls -F |grep /|wc

Above command is still running even after ten minutes, so i had to cancel it.

Till now it is already clear to me the issue is very possible caused by Inode resource used up:
$ df -h -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/rootvg-rootvol
15M 134K 15M 1% /
/dev/cciss/c0d0p1 65K 46 65K 1% /boot
tmpfs 16M 74 16M 1% /dev/shm
/dev/mapper/rootvg-lv_data
5.6M 4.4K 5.6M 1% /data
/dev/mapper/cihcispdb716_grid_vg-gridvol
2.2M 473K 1.7M 22% /prod/grid
/dev/mapper/cihcispdb716_product_vg-productvol
4.3M 4.3M 0K 100% /prod/product
3.24.148.15:/vol/database_backup
31M 37K 31M 1% /prod/backup
3.24.148.13:/vol/cis_orasoft
31M 5.5M 25M 19% /cis_orasoft

From above red part it shows Inodes used up. That's why we can't create any file even there are enough space left:
node3[oracle]_corp_bac3> df -h|grep /prod/product
34G 27G 5.1G 84% /prod/product

Cleaned some of the audit file and check：
[oracle@node3 11.2.0]$ df -h -i /prod/product
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/cihcispdb716_product_vg-productvol
4.3M 3.8M 444K 90% /prod/product

Connect the third node again, succeed.

More......

Pages