June 8, 2012

identify root session among hundreds of locks using hanganalyze

Today, one of our most critical production DB reports lock issue.
When i login into the DB there are already more then 200+ locks, and most of them are TM lock.

Below is the list:
SESS                   ID1        ID2      LMODE    REQUEST TY      CTIME
--------------- ---------- ---------- ---------- ---------- -- ----------
Holder: 4738         16737          0          2          0 TM       4021
Holder: 2080         16737          0          2          0 TM       4021
............
............

Holder: 3480         17347          0          3          0 TM       5958
Waiter: 4975         17347          0          0          2 TM       5792
Waiter: 1015         17347          0          0          2 TM       4090
Waiter: 1058         17347          0          0          2 TM       5832

............
Holder: 4729        983064    3155735          6          0 TX       3229
Waiter: 2301        983064    3155735          0          6 TX       2327
Waiter: 1194       2752522    2373725          0          6 TX       2931
Waiter: 3359       2752522    2373725          0          6 TX       2397


286 rows selected.

There are see 286 locking sessions as well as so dozens blockers which of couse we can't kill them all.
Then here is the problem: how to identify the root session?

In 11g there is a powerful View "V$WAIT_CHAINS". But our DB is 10g.
BLOCKING_SESSION column in v$session is useful but not for so many locks and so many blockers.

systemstate dump is also not a choice since i lost most of my scripts including ass109.awk after computer re-installed.

Then the left choice is hanganalyze, let's do it:

==============
HANG ANALYSIS:
==============
Found 106 objects waiting for
<0/4634/18424/0x600086e0/28223/enq: TM - contention>
Found 105 objects waiting for
<0/3317/59720/0x20005f58/27464/enq: TM - contention>
...............
...............
Found 20 objects waiting for
<0/4974/30755/0x60041418/29550/enq: TM - contention>
Open chains found:

Chain 1 : :
<0/3480/26248/0x200229e8/190/db file sequential read>
-- <0/4634/18424/0x600086e0/28223/enq: TM - contention>
-- <0/679/55269/0xa004e990/13104/enq: TM - contention>
-- <0/793/864/0xa0054088/25644/enq: TM - contention>
Chain 2 : :
<0/3797/1/0x20001048/24805/log file parallel write>
-- <0/707/60023/0xa001bb38/27013/log file sync>
Other chains found:
Chain 3 : :
<0/689/49815/0xa0016c28/15330/enq: TM - contention>
-- <0/793/864/0xa0054088/25644/enq: TM - contention>
...............
...............
Chain 104 : :       ---------(oops.  104 chains!!)
<0/5041/14667/0x6000b650/6616/PX Deq: Table Q Normal>
State of nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor):
[678]/0/679/55269/0xa05c3618/13104/NLEAF/1/8/[4633][3316]/792
[681]/0/682/5041/0xa05c7650/22742/IGN/9/10//none
[682]/0/683/33667/0xa05c8bb8/27585/IGN/11/12//none
[687]/0/688/35783/0xa05cf6c0/13708/IGN/13/14//none
...............
...............
[5063]/0/5064/1/0x608d2e40/24807/IGN/1657/1658//none
[5064]/0/5065/1/0x608d43a8/24799/IGN/1659/1660//none
====================
END OF HANG ANALYSIS
====================


From above hanganylyze session it is clear that root chain (Open chains part) is:
Open chains found:
Chain 1 : :
<0/3480/26248/0x200229e8/190/db file sequential read>
-- <0/4634/18424/0x600086e0/28223/enq: TM - contention>
-- <0/679/55269/0xa004e990/13104/enq: TM - contention>
-- <0/793/864/0xa0054088/25644/enq: TM - contention>
Chain 2 : :
<0/3797/1/0x20001048/24805/log file parallel write>
-- <0/707/60023/0xa001bb38/27013/log file sync>


And the LEAF is:
server444[oracle]_hrpp1> cat supp_ora_17698.trc|grep -i single|grep -v none
[level 5] : 83 node dumps -- [SINGLE_NODE] [SINGLE_NODE_NW] [IGN_DMP]
server444[oracle]_hrpp1> cat supp_ora_17698.trc|grep -i leaf|grep -v NLEAF
[level 4] : 2 node dumps -- [REMOTE_WT] [LEAF] [LEAF_NW]
[3479]/0/3480/26248/0x20729710/190/LEAF/3/4//4633
[3796]/0/3797/1/0x208d18d8/24805/LEAF/28/29//706



Now it is clear to us Session 3480 and 3797 have the most suspect.
After check these two sessions in DB, we identify the 3480 session as the root session since it is also an pldevelop session:

SSID SERIAL# SPID OSUSER USERNAME MACHINE LOGON_TIME STATE STATUS SQL_ID PREV_SQL_ID WAIT_TIME SECONDS_IN_WAIT EVENT BLOCKING_INSTANCE BLOCKING_SESSION
----- ---------- ------------ ---------- -------------------- ---
PROGRAM
------------------------------------------------
SQL_TEXT
-----------------------------------------------------------------
3480 26248 190 324979 SUPXXXHOC MYHOMEGDC\MYH136092 07-JUN-12 16:17 WAITED SHORT TIME ACTIVE 5xzua1wzkfwpy fvgnun67x1bcg -1 1 db file sequential read
plsqldev.exe
DELETE FROM T_HAD_XXXX_AUDIT WHERE FORM_ID = :B1


After killed the session all 280+ locks released immediate.

0 Comments:

Post a Comment