A DB suddenly hang during a load. User come to me.
When i login the server, i can see the issue DB consuming 100% CPU on the server.
Tried "sqlplus / as sysdba" but also hang.
I gernerate a systemstate dump and notice that the PMON is blocked:
PROCESS 2: PMON
O/S info: user: oracle, term: UNKNOWN, ospid: 1193
OSD pid info: Unix process pid: 1193, image: oracle@alpcisddb484.corporate.ge.com (PMON)
waiting for 60108b48 Child shared pool level=7 child#=3
Location from where latch is held: kgh.h LINE:6387 ID:kghalo:
Context saved from call: 0
state=busy [holder orapid=89] wlstate=free [value=0]
waiters [orapid (seconds since: put on list, posted, alive check)]:
possible holder pid = 89 ospid=10478
Let's check which session is holding latch 60108b48, and what is the session waiting for:
PROCESS 89:
O/S info: user: oracle, term: UNKNOWN, ospid: 10478
OSD pid info: Unix process pid: 10478, image: oracle@alpcisddb484.corporate.ge.com
holding (efd=8) 60108b48 Child shared pool level=7 child#=3
Current Wait Stack:
0: waiting for 'library cache: mutex X'
idn=0xfc57b2b7, value=0xd100000000, where=0x4f
wait_id=57 seq_num=60 snap_id=1
wait times: snap=45 min 11 sec, exc=45 min 11 sec, total=45 min 11 sec
wait times: max=infinite, heur=45 min 11 sec
wait counts: calls=0 os=0
in_wait=1 iflags=0x5a2
Let’s see which session is holding mutex 0xfc57b2b7 and what is the session waiting for:
PROCESS 42: M000
SO: 0x19f40dda8, type: 4, owner: 0x19f2de3f8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x19f2de3f8, name=session, file=ksu.h LINE:12459, pg=0
(session) sid: 209 ser: 521 trans: (nil), creator: 0x19f2de3f8
Current Wait Stack:
Not in wait; last wait ended 45 min 10 sec ago
OK, till now we found the root blocking is M000.
It is holding mutex:
PROCESS 42: M000
SO: 0x19f40dda8, type: 4, owner: 0x19f2de3f8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x19f2de3f8, name=session, file=ksu.h LINE:12459, pg=0
(session) sid: 209 ser: 521 trans: (nil), creator: 0x19f2de3f8
Current Wait Stack:
Not in wait; last wait ended 45 min 10 sec ago
KGL-UOL SO Cache(total=240, free=0)
KGX Atomic Operation Log 0x17d569fc0
Mutex 0x19eb2b4c8(209, 0) idn 1fc57b2b7 oper EXCL
Library Cache uid 209 efd 7 whr 49 slp 0
oper=0 pt1=(nil) pt2=(nil) pt3=(nil)
But above also shows that M000 itself not in a wait since 45 mins back. What is M000 doing and why it doesn't release that mutex all along?
It is un-reasonable.
After review the process state of M000, i notice the process is already dead:
SO: 0x19f2de3f8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x19f2de3f8, name=process, file=ksu.h LINE:12451, pg=0
(process) Oracle pid:42, ser:16, calls cur/top: 0x1908c1eb8/0x1906a28f8
flags : (0x3) DEAD
flags2: (0x8030), flags3: (0x0)
intr error: 0, call error: 0, sess error: 0, txn error 0
intr queue: empty
ksudlp FALSE at location: 0
Cleanup details:
Found dead = 44 min 27 sec ago
Total Cleanup attempts = 10, Cleanup time = 15 min 49 sec, Cleanup timer = 15 min 45 sec
Last attempt (full) ended 54 sec ago, Length = 2 min 30 sec, Cleanup timer = 2 min 30 sec
OK. Now it is clear to us that M000's death leading to the DB hang.
But why M000 suddenly dead?
After checking alert.log i find blow:
Thu Dec 06 22:00:48 2012
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x1060] [PC:0x90ED147, kglic0()+1075] [flags: 0x0, count: 1]
Errors in file /x484/dump01/oracle/hrxt/diag/rdbms/hrxdt/hrxdt_m000_10618.trc (incident=6337):
ORA-07445: exception encountered: core dump [kglic0()+1075] [SIGSEGV] [ADDR:0x1060] [PC:0x90ED147] [Address not mapped to object] []
Now all are clear.
After searching on metalink with above symbol, our scenario excatly matches this bug:
MMON Slave Process Reports ORA-7445 [kglic0] Error, Plus Database Hangs [ID 1487720.1]
GE, AGFA. experienced in 9i/10g/11g/12c, AdvanceReplication, Stream, GoldenGate, RAC, DataGuard, ODA, EXADATA, GridControl. In my career I have always been assigned to handle most complicated and critical oracle issues without solution online, and I hope to record some of them via this blog.
December 7, 2012
MMON Slave Process Reports ORA-7445 [kglic0] Error, Plus Database Hangs
Tags:
7445,
hang,
kglic0,
kksIterCursorStat,
M000,
MMON,
ORA-7445,
PMON failed to acquire latch. cpu,
prelim,
slave
Subscribe to:
Post Comments (Atom)
0 Comments:
Post a Comment