December 7, 2012

MMON Slave Process Reports ORA-7445 [kglic0] Error, Plus Database Hangs

A DB suddenly hang during a load. User come to me.
When i login the server, i can see the issue DB consuming 100% CPU on the server.

Tried "sqlplus / as sysdba" but also hang.
I gernerate a systemstate dump and notice that the PMON is blocked:

PROCESS 2: PMON
    O/S info: user: oracle, term: UNKNOWN, ospid: 1193 
    OSD pid info: Unix process pid: 1193, image: oracle@alpcisddb484.corporate.ge.com (PMON)
      waiting for 60108b48 Child shared pool level=7 child#=3 
        Location from where latch is held: kgh.h LINE:6387 ID:kghalo: 
        Context saved from call: 0
        state=busy [holder orapid=89] wlstate=free [value=0]
          waiters [orapid (seconds since: put on list, posted, alive check)]:
possible holder pid = 89 ospid=10478

Let's check which session is holding latch 60108b48, and what is the session waiting for:
PROCESS 89: 
    O/S info: user: oracle, term: UNKNOWN, ospid: 10478 
    OSD pid info: Unix process pid: 10478, image: oracle@alpcisddb484.corporate.ge.com
      holding    (efd=8) 60108b48 Child shared pool level=7 child#=3 
    Current Wait Stack:
     0: waiting for 'library cache: mutex X'
        idn=0xfc57b2b7, value=0xd100000000, where=0x4f
        wait_id=57 seq_num=60 snap_id=1
        wait times: snap=45 min 11 sec, exc=45 min 11 sec, total=45 min 11 sec
        wait times: max=infinite, heur=45 min 11 sec
        wait counts: calls=0 os=0
        in_wait=1 iflags=0x5a2

Let’s see which session is holding mutex 0xfc57b2b7 and what is the session waiting for:
PROCESS 42: M000
    SO: 0x19f40dda8, type: 4, owner: 0x19f2de3f8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0x19f2de3f8, name=session, file=ksu.h LINE:12459, pg=0
    (session) sid: 209 ser: 521 trans: (nil), creator: 0x19f2de3f8
    Current Wait Stack:
      Not in wait; last wait ended 45 min 10 sec ago 

OK, till now we found the root blocking is M000.
It is holding mutex:
PROCESS 42: M000
    SO: 0x19f40dda8, type: 4, owner: 0x19f2de3f8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0x19f2de3f8, name=session, file=ksu.h LINE:12459, pg=0
    (session) sid: 209 ser: 521 trans: (nil), creator: 0x19f2de3f8
    Current Wait Stack:
      Not in wait; last wait ended 45 min 10 sec ago 
      KGL-UOL SO Cache(total=240, free=0)
      KGX Atomic Operation Log 0x17d569fc0
       Mutex 0x19eb2b4c8(209, 0) idn 1fc57b2b7 oper EXCL
       Library Cache uid 209 efd 7 whr 49 slp 0
       oper=0 pt1=(nil) pt2=(nil) pt3=(nil)

But above also shows that M000 itself not in a wait since 45 mins back. What is M000 doing and why it doesn't release that mutex all along?
It is un-reasonable.

After review the process state of M000, i notice the process is already dead:
SO: 0x19f2de3f8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x19f2de3f8, name=process, file=ksu.h LINE:12451, pg=0
  (process) Oracle pid:42, ser:16, calls cur/top: 0x1908c1eb8/0x1906a28f8
            flags : (0x3) DEAD
            flags2: (0x8030),  flags3: (0x0) 
            intr error: 0, call error: 0, sess error: 0, txn error 0
            intr queue: empty
    ksudlp FALSE at location: 0
    Cleanup details:
      Found dead = 44 min 27 sec ago
      Total Cleanup attempts = 10, Cleanup time = 15 min 49 sec, Cleanup timer = 15 min 45 sec
      Last attempt (full) ended 54 sec ago, Length = 2 min 30 sec, Cleanup timer = 2 min 30 sec

OK. Now it is clear to us that M000's death leading to the DB hang.
But why M000 suddenly dead?
After checking alert.log i find blow:
Thu Dec 06 22:00:48 2012
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x1060] [PC:0x90ED147, kglic0()+1075] [flags: 0x0, count: 1]
Errors in file /x484/dump01/oracle/hrxt/diag/rdbms/hrxdt/hrxdt_m000_10618.trc  (incident=6337):
ORA-07445: exception encountered: core dump [kglic0()+1075] [SIGSEGV] [ADDR:0x1060] [PC:0x90ED147] [Address not mapped to object] []

Now all are clear.
After searching on metalink with above symbol, our scenario excatly matches this bug:
MMON Slave Process Reports ORA-7445 [kglic0] Error, Plus Database Hangs [ID 1487720.1]

0 Comments:

Post a Comment