November 27, 2012

LOCK diagnose: NOWAIT lock requests could hang in RAC (blocked in compatble mode)

One 4 instances Rac DB report lock issue:

INST_ID SESSION                ID1        ID2      LMODE    REQUEST TY      CTIME
---------- --------------- ---------- ---------- ---------- ---------- -- ----------
         1 Holder: 5887             3          1          3          0 TS    2665025
         1 Waiter: 4145             3          1          0          3 TS      17970
         1 Waiter: 336              3          1          0          3 TS      17970
         1 Waiter: 6331             3          1          0          3 TS      17970
         1 Waiter: 555              3          1          0          3 TS      17970
         1 Waiter: 5455             3          1          0          3 TS      17970
         1 Waiter: 6109             3          1          0          3 TS      17970
         1 Waiter: 441              3          1          0          3 TS      17970
         1 Waiter: 5783             3          1          0          3 TS      17970

Mutiple sessiones are blocked,from CTIME we can see they have been blocked for quite a long time.
5887 is SMON. All other blocked sessions are slave processes of session 772 which is running a schema gather stats task.

v$session:
INST  SSID    SERIAL#                      OSUSER     USERNAME             MACHINE                        LOGON_TIME           STATE               STATUS   SQLID        CHID_NUM P_SQLID       P_CHILD_NUM MODULE                                                            WAIT_TIME
---------- ----- ---------- ------------------------ ---------- -------------------- ------------------------------ -------------------- ------------------- -------- ------------- ---------- ------------- ----------- ---------------------------------------------------------------- ----------
SECONDS_IN_WAIT LAST_CALL_ET BLOCKING_INSTANCE BLOCKING_SESSION PROGRAM                                          EVENT                                        P1TEXT                P1 P2TEXT                       P2 P3TEXT                       P3
--------------- ------------ ----------------- ---------------- ------------------------------------------------ ---------------------------------------------------------------- -------------------- ---------- -------------------- ---------- -------------------- ----------
SQL_TEXT
-------------------------------------------------------------------------------------------------------------------------------------------------------------
         1   772      48345                     oracle     SYSTEM               alxxxx.com  19-NOV-1216:30       WAITING             ACTIVE   awdww4g991vsd       0 fwrkt5qx1jjfd           0 SQL*Plus                                                                  0
          17621        38325                 1              336 sqlplus                                          PX Deq: Parse Reply                          sleeptime/senderid           200 passes                    10594                               0
insert /*+ append */ into sys.ora_temp_1_ds_3021076 SELECT /*+  parallel(t,8) parallel_index(t,8) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_...........

         1  5887          1                     oracle                          alxxxx.com  20-OCT-1207:55       WAITING             ACTIVE                         gm9t6ycmb1yu6           0                                                                           0
              3      2664821                                    oracle@alxxxx.com (SMON)      smon timer                                   sleep time                   300 failed                        0                               0
delete from smon_scn_time where scn =  (select min(scn) from smon_scn_time)

We can see during blocking, the gather stats session is trying to insert into temp table sys.ora_temp_1_ds_3021076:
SYS @ sccc1 >  select owner,table_name , TABLESPACE_NAME,SEGMENT_CREATED from dba_tables where table_name=upper('ora_temp_1_ds_3021076');
OWNER          TABLE_NAME                     TABLESPACE_NAME                SEG
-------------- ------------------------------ ------------------------------ ---
SYS            ORA_TEMP_1_DS_3021076                                         YES

First thought is why TS blocking?
TS lock is TEMP related.

SMON is charge of some TEMP related tasks. But normally the response is very fast, why in our case it is blocked for such a long time?
Let's check SMON trace file:
*** 2012-11-24 20:50:14.840
SMON: system monitor process posted msgflag:0x0000 (-/-/-/-/-/-/-)

*** 2012-11-24 20:50:14.841
SMON: process sort segment requests begin

*** 2012-11-24 20:50:14.841
SMON: process sort segment requests end

*** 2012-11-24 20:50:14.841
SMON: Posted, but not for trans recovery, so skip it.
SMON: poll shared mount broadcast channel begin
SMON: poll shared mount broadcast channel end
SMON: poll flashback broadcast channel begin
SMON: poll flashback broadcast channel end
SMON: poll segment info broadcast channel begin
SMON: poll segment info broadcast channel end

From above we can see SMON is idle posted. From v$sort_usage it shows the temp table is only 1mb big, the temp tablespace is nearly 100% free and system load is very low.
So what is the blocked session waiting for?

I search google with Key word: TS lock, but that doesn't help.
I decide to review currently information at my hand, and then i notice:
INST_ID SESSION                ID1        ID2      LMODE    REQUEST TY      CTIME
---------- --------------- ---------- ---------- ---------- ---------- -- ----------
         1 Holder: 5887             3          1          3          0 TS    2665025
         1 Waiter: 4145             3          1          0          3 TS      17970
         1 Waiter: 336              3          1          0          3 TS      17970
         1 Waiter: 6331             3          1          0          3 TS      17970
         1 Waiter: 555              3          1          0          3 TS      17970
         1 Waiter: 5455             3          1          0          3 TS      17970
         1 Waiter: 6109             3          1          0          3 TS      17970
         1 Waiter: 441              3          1          0          3 TS      17970
         1 Waiter: 5783             3          1          0          3 TS      17970

We can see LMODE and REQUEST are both 3 which should be compatible. Wiered!

Continue:
RESOURCE_NAME1                 GRANT_LEV REQUEST_L        PID OWNER_NODE WHICH_QUEUE STATE        BLOCKED    BLOCKER    
------------------------------ --------- --------- ---------- ---------- ----------- ------------ ---------- ----------
[0x3][0x1],[TS][ext 0x0,0x0]   KJUSERCW  KJUSERCW       15822          0           1 GRANTED       0         0
[0x3][0x1],[TS][ext 0x0,0x0]   KJUSERNL  KJUSERCW       16825          0           2 OPENING       1         0
[0x3][0x1],[TS][ext 0x0,0x0]   KJUSERNL  KJUSERCW       15354          0           2 OPENING       1         0                
[0x3][0x1],[TS][ext 0x0,0x0]   KJUSERNL  KJUSERCW       16823          0           2 OPENING       1         0                
[0x3][0x1],[TS][ext 0x0,0x0]   KJUSERNL  KJUSERCW       15350          0           2 OPENING       1         0                
[0x3][0x1],[TS][ext 0x0,0x0]   KJUSERNL  KJUSERCW       15356          0           2 OPENING       1         0                
[0x3][0x1],[TS][ext 0x0,0x0]   KJUSERNL  KJUSERCW       15352          0           2 OPENING       1         0                
[0x3][0x1],[TS][ext 0x0,0x0]   KJUSERNL  KJUSERCW       15358          0           2 OPENING       1         0                
[0x3][0x1],[TS][ext 0x0,0x0]   KJUSERNL  KJUSERCW       15360          0           2 OPENING       1         0                

And let's check resource [0x3][0x1],[TS]:
RESOURCE_NAME                  ON_CONVERT_Q ON_GRANT_Q NEXT_CVT_ MASTER_NODE
------------------------------ ------------ ---------- --------- -----------
[0x3][0x1],[TS][ext 0x0,0x0]              1          1 KJUSERPR            2

From above we can see the lock's NEXT_CVT is PR. It is wiered, since currently all request on this instance is CW mode.
For TS lock the request can't be in another instance, but i still decide to go to check for sure.

The master instance for resouce [0x3][0x1],[TS] is node 3. Let's check node 3:
RESOURCE_NAME1                 GRANT_LEV REQUEST_L        PID OWNER_NODE WHICH_QUEUE
------------------------------ --------- --------- ---------- ---------- -----------
STATE                                                               BLOCKED    BLOCKER
---------------------------------------------------------------- ---------- ----------
[0x3][0x1],[TS][ext 0x0,0x0]   KJUSERCW  KJUSERCW           0          0           1
GRANTED                                                                   0          0

Now It is clear that in whole Rac, only instance one is requesting the resource. And request level is CW.
Which means, all blocked session on instance one is blocked by a unreasonable PR convert while the PR converting's requester doesn't exist!

Now we know it must be a bug. Search on metalink with above information and very soon we found below article which exactly match our case:
Bug 12865902 - NOWAIT lock requests could hang (like Parallel Queries may hang "enq: TS - contention") in RAC

And solution is simple. We can apply the patch or disable parallel.

1 Comments:

Jeff Moss said...

I think we're hitting this and your article is useful...would be even more useful if you supplied the SQL you used on some of your bits where you just show the results...so we could issue them and see the same on our system...?

Thanks
Jeff

Post a Comment