June 10, 2014

long parsing time issue: bundles of SQLs cost 10+ seconds to do a hard parse

One database, after migrating to 11.2.0.3, customer started to complain the performance was much slower than before.
After checking, we saw for most SQLs, the execution speed was fast while the parsing time was too long.

I took one SQL as example, ran it multiple times to ensure there was no physical read:

no rows selected
Elapsed: 00:00:00.11
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2761 consistent gets
0 physical reads
0 redo size
16589 bytes sent via SQL*Net to client
513 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
0 rows processed

Then I added hint /*hard parse by me*/ to force a hard parse, still there was no physical read:
no rows selected
Elapsed: 00:00:03.73
Statistics
----------------------------------------------------------
22 recursive calls
0 db block gets
2767 consistent gets
0 physical reads
0 redo size
16589 bytes sent via SQL*Net to client
513 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
0 rows processed

Without parsing, the execution time was 0.11 second,but with parsing, the SQL cost 3.73 seconds.
There was no lock/mutex/latch/pin contention in database and server load was low.

Below query shew for many SQLs, a single hard parsing cost even more than 50 seconds:

SQL> SELECT SQL_ID,
  AVG_HARD_PARSE_TIME_MS
FROM
(SELECT
    ROUND(AVG_HARD_PARSE_TIME / 1000, 2)
      AVG_HARD_PARSE_TIME_MS,
    SQL_ID,
    EXECUTIONS
  FROM
    V$SQLSTATS 
  ORDER BY AVG_HARD_PARSE_TIME DESC )
WHERE
  ROWNUM <= 10; 
SQL_ID        AVG_HARD_PARSE_TIME_MS
------------- ----------------------
63n9pwutt8yzw              223279.79
8pduputnkhukx              174644.33
gg9m2b0hjky1y              161064.91
g0vn88sqftx3c               66367.09
bbvy8xjs7gjgj               64747.76
7bgzzkwsqr6hc               63303.48
bzhxncjcy1k96               61963.71
8y387dwxggn5t               57265.29
1f25j21vvfv10               54766.09
0br8s1y3ma0qr               53578.61

This was a very complicated hospital system. Though Bind Variables was in use, still large amount of hard parse were inevitable. 
 For instance, to querying a patient's report, doctors could use any combinations of below filter's : 
1. patient ID 
2. patient name 
3. patient walk in date 
4. surgery name 
5. surgery date 
6. doctor's name 
7. doctor's ID 
8. disease type 
9. severity of disease 
10. hospitalized or not 
11. Body Scan type:MRI/XRAY 
....... 
There are numerous combinations and each one could generate a new SQL therefore a hard parse. 

 Due to 11g new feature ACS, even for one same SQL, when the variables are different, it can generated multiple different child cursors(hard parse) as well. 
To minimize the count of hard parse, I set below parameters to disable ACS:
SQL> show parameter cursor
NAME                                   TYPE       VALUE
-------------------------------------- --------- ----------------------
_optimizer_adaptive_cursor_sharing     boolean   FALSE
_optimizer_extended_cursor_sharing     string    NONE
_optimizer_extended_cursor_sharing_rel string    NONE

After setting above parameters, the total parse count reduced 80% percent and the overall performance was much better. 
But still,the long consuming time for each single parsing was not reduced, customer may still encounter a slowness now and then if one SQL fell into a hard parse. 

Let's analyze the long parsing time issue. 
Here is an SQL trace:
call    count  cpu      elapsed    disk       query      current    rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse   1      0.01     0.00       0          0          0          0
Execute 1      3.30     3.30       0          6          0          0
Fetch   1      0.00     0.00       0          59         0          0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total   3      3.31     3.31       0          65         0          0

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 36 
Number of plan statistics captured: 1

And here is 10046 raw trace:
PARSING IN CURSOR #18446744071468120416 len=14352 dep=0 uid=36 oct=3 lid=36 tim=40253210782553 hv=3244419326 ad='7a995e8c8' sqlid='d228z5m0q3u7y'
PARSE #18446744071468120416:c=10000,e=9046,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=40253210782549
WAIT #18446744071468120416: nam='SQL*Net message to client' ela= 6 driver id=1413697536 #bytes=1 p3=0 obj#=49430 tim=40253213924594
WAIT #18446744071468120416: nam='SQL*Net more data to client' ela= 184 driver id=1413697536 #bytes=8145 p3=0 obj#=49430 tim=40253213934106
WAIT #18446744071468120416: nam='SQL*Net message from client' ela= 38091 driver id=1413697536 #bytes=1 p3=0 obj#=49430 tim=40253213972419
EXEC #18446744071468120416:c=3310000,e=3313604,p=0,cr=6,cu=0,mis=1,r=0,dep=0,og=1,plh=1738446058,tim=40253217287241
WAIT #18446744071468120416: nam='SQL*Net message to client' ela= 6 driver id=1413697536 #bytes=1 p3=0 obj#=49430 tim=40253217289638
WAIT #18446744071468120416: nam='SQL*Net more data to client' ela= 181 driver id=1413697536 #bytes=8145 p3=0 obj#=49430 tim=40253217299453
WAIT #18446744071468120416: nam='SQL*Net message from client' ela= 3096 driver id=1413697536 #bytes=1 p3=0 obj#=49430 tim=40253217302788
FETCH #18446744071468120416:c=0,e=3601,p=0,cr=59,cu=0,mis=0,r=0,dep=0,og=1,plh=1738446058,tim=40253217306558
WAIT #18446744071468120416: nam='SQL*Net message to client' ela= 5 driver id=1413697536 #bytes=1 p3=0 obj#=49430 tim=40253217311918
WAIT #18446744071468120416: nam='SQL*Net message from client' ela= 22854 driver id=1413697536 #bytes=1 p3=0 obj#=49430 tim=40253217334883


From above red part, the major running time was not consumed at fetch, but execute. To narrow out the influence of bind peeking, I used explain plan for to analyze SQL with only bind variables:
explain plan for select /*hard47*/ .......
DECODE(c.c1,:"SYS_B_14",:"SYS_B_15",:"SYS_B_16"),DECODE(c.c2,:"SYS_B_17",:"SYS_B_18",:"SYS_B_19"),
..........
FROM
TABLE1,TABLE2,VIEW1,VIEW2,VIEW3.....
WHERE NVL(QPQD_EX_KEY,:"SYS_B_60") > :"SYS_B_61"  AND   a.QPQD_AP_PATKEY = :1    
Elapsed: 00:00:03.66

Still the SQL cost 3 seconds to parse with out bind peeking.
To make our test more clear, I found a new SQL, and this new SQL cost 13 seconds to parse.


We know some CBO parameters can limit choices when calculating execution plan during a hard parse.
Here is the result for the 13 seconds parsing SQL after we set different values for parameter OPTIMIZER_FEATURES_ENABLE:

ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE = '9.2.0.8'; --- 0.82 seconds
ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE = '10.1.0'; --- 1.03 seconds
ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE = '10.2.0.1'; --- 4.03 seconds
ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE = '11.1.0.7'; --- 12.20 seconds
ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE = '11.2.0.3'; --- 12.91 seconds

When OPTIMIZER_FEATURES_ENABLE=11.2.0.3, the parse time was13 seconds, while 9.2.0.8, the parsing time was only 0.82 seconds. 
 Next I tested optimizer_mode, and as what I had thought, when OPTIMIZER_FEATURES_ENABLE was11.2.0.3, after set optimizer_mode to RULE, the parse time reduced to 0.5 second. 

 But, in a production environment, we can't simply downgrade OPTIMIZER_FEATURES_ENABLE to 9.2.0.8 or changing optimizer_mode to RULE, as these two parameters influenced too much.
I wanted to find a specific parameter that particular to address this long parse problem.

Then I made 10053 trace to analyze the process of SQL parse.
The 10053 trace file was 44MB which was normal, because if SQL parsing cost 13 seconds, and 10053 trace file was only a few hundreds KB, then there must be something wrong.


Let's look into the structure of the 10053 trace file:

pacsrd1212 $ cat  QPROD_ora_8456_10053.trc|grep -i "permutations tried"|grep -v "permutations tried: 1"
Number of join permutations tried: 2
Number of join permutations tried: 6
Number of join permutations tried: 52
Number of join permutations tried: 2
Number of join permutations tried: 3
Number of join permutations tried: 3
Number of join permutations tried: 3
Number of join permutations tried: 499
Number of join permutations tried: 84
Number of join permutations tried: 2
Number of join permutations tried: 3
Number of join permutations tried: 3
Number of join permutations tried: 3
Number of join permutations tried: 798
Number of join permutations tried: 84
Number of join permutations tried: 80
Number of join permutations tried: 2
Number of join permutations tried: 3
Number of join permutations tried: 3
Number of join permutations tried: 3
Number of join permutations tried: 499
Number of join permutations tried: 84
Number of join permutations tried: 451
Number of join permutations tried: 483
Number of join permutations tried: 84
Number of join permutations tried: 2
Number of join permutations tried: 6
Number of join permutations tried: 52
Number of join permutations tried: 435
Number of join permutations tried: 84

And below was another 10053 trace file while OPTIMIZER_FEATURES_ENABLE = '9.2.0.8', the size of which was only 2M while parsing time was within 1 second:
pacsrd1212 $ cat QPROD_ora_7830_10053_2.trc|grep -i "permutations tried"|grep -v "permutations tried: 1"
Number of join permutations tried: 2
Number of join permutations tried: 6
Number of join permutations tried: 20
Number of join permutations tried: 2
Number of join permutations tried: 3
Number of join permutations tried: 3
Number of join permutations tried: 3
Number of join permutations tried: 95

We can see, while OPTIMIZER_FEATURES_ENABLE is '11.2.0.3', CBO calculated so much join orders, no wonder it cost that long time.
We can set hidden parameter “_OPTIMIZER_MAX_PERMUTATIONS” to limit the max number of join orders CBO could pick.
After I tested to set _OPTIMIZER_MAX_PERMUTATIONS=50, the SQL parsing time reduced from 13 seconds to 6 seconds.
But this parameter was also not acceptable to me since it influenced too much.

Let us review the join orders in 10053 trace file, here is one sample:

PPOINTMENT_ITEMS[J]#6  QPQD_LINK[QPQD_LINK]#7  VW_NSO_3[VW_NSO_3]#11  QPQD_MACHINELINKS[K]#12  EXAMS[C]#10  EXAMINATIONS[E]#8  EXAMGROUPS[F]#9  VW_PRIORITY_SEQUENCE_PLACER[G]#13
Join order aborted: cost > best plan cost
Join order[449]:  QPQDVIEW541[A]#1  PATIENTS[B]#0  QPQD_LINK[Z]#2  APPOINTMENTS[D]#3  APPOINTMENT_ITEMS[H]#4  QPQD_ROOMLINKS[I]#5  APPOINTMENT_ITEMS[J]#6  QPQD_LINK[QPQD_LINK]#7  EXAMINATIONS[E]#8  EXAMGROUPS[F]#9  EXAMS[C]#10  VW_NSO_3[VW_NSO_3]#11  QPQD_MACHINELINKS[K]#12  VW_PRIORITY_SEQUENCE_PLACER[G]#13
Join order aborted: cost > best plan cost
Considering multiple instances based initial join order.
Best join order so far: 358
CBO found the best join order at the 358th,so if we limited _OPTIMIZER_MAX_PERMUTATIONS to a small value,we may miss this join order as our best plan.

 We can see there are 14 tables there, in theory, the maximum join orders could be 14*13*12*11*....*3*2*1=87178291200. 
Of course CBO won't be that stupid to test all the 87178291200 join orders. 
Based on its rules, CBO ignored most join orders which are destined to be low effective. 

And what we really need to do, is through turning some of CBO's specific parameters, we limit choices of CBO. 
The less of useless join orders that CBO tries, the less of parse time is needed. 
Next, I went to review the low parsing SQLs themselves. 
And I noticed that they have some common symbols, such as they all contains lots of tables and complex View, as well as nested child query within IN/EXIST. 

 So then I begain to test some of parameters related to unnest and merge:
alter session set query_rewrite_enabled = false; 
alter session set"_optimizer_table_expansion"=false;
alter session set "_and_pruning_enabled"=false;
alter session set "_subquery_pruning_enabled"=false;
alter session set "_optimizer_fast_access_pred_analysis" =false;
alter session set "_optimizer_sortmerge_join_enabled"   = false;
alter session set "_optimizer_sortmerge_join_inequality" = false;
............... 

None of above parameters were working. I reviewed of the parameters and noticed below test I did:
ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE = '9.2.0.8'; --- 0.82 seconds
ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE = '10.1.0'; --- 1.03 seconds
ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE = '10.2.0.1'; --- 4.03 seconds
ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE = '11.1.0.7'; --- 12.20 seconds
ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE = '11.2.0.3'; --- 12.91 seconds

And this caught my attention::
ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE = '10.1.0.3';     ---  1.40  seconds
ALTER SESSION SET OPTIMIZER_FEATURES_ENABLE = '10.1.0.4';     ---  3.59  seconds

OPTIMIZER_FEATURES_ENABLE controls a bunch of CBO related parameters, and changing OPTIMIZER_FEATURES_ENABLE equals changing bundles of those CBO parameters together. 

In my previous working experience, I knew there was a parameter introduced in 10.1.0.4.
And this parameter has caused a lot off issues since it was introduced:
_optimizer_cost_based_transformation.
I felt this parameter should be the RCA.

Let's test this parameter, from below I saw while the parameter changing, the behavior of the SQL costed 13 seconds to parse initially:

alter session set "_optimizer_cost_based_transformation"=exhaustive;    ---- Elapsed: 00:00:14.11
alter session set "_optimizer_cost_based_transformation"=iterative;     ---- Elapsed: 00:00:13.71
alter session set "_optimizer_cost_based_transformation"=linear;        ---- Elapsed: 00:00:13.97
alter session set "_optimizer_cost_based_transformation"=on;            ---- Elapsed: 00:00:13.97
alter session set "_optimizer_cost_based_transformation"=off;           ---- Elapsed: 00:00:03.05
After setting it to off, the parsing time reduced to 3 seconds.

And later, I did more tests and turned a few more parameters, including:
alter session set "_b_tree_bitmap_plans"=false; (cause I saw some useless bitmap plan judgement in 10053 trace)
alter session set "optimizer_index_cost_adj" = 50;
..................

And finally, the parsing time for the SQL reduced to 1 second from 13 seconds. And all the issue SQLs reduced 90%+ percent of parsing time.






Conclusion, let me make it a bit more clear:
A SQL's running has two major steps:
1. PARSE : this step oracle translate the SQL and find a best execution plan for the SQL.
2. Execution : this step, oracle execute with the execution plan generated at step 1 and return the data.

In very most cases, the SQL's time is mainly consumed at step 2 and for that we turn execution plans, we add indexes etc.

But in our case, due to the SQL is too complex, oracle cost too much time at step 1 while step 2 is quite fast.

For some issue SQLs, there are sometimes around 14 join tables. (A view can be made by multiple tables.)
So, now, oracle calculated a maxmum 14*13*12*....*3*2*1's possible join orders for the 14 tables.
Finally, after 60 seconds, oracle say:
ok, I have done my research and find a best plan, by using this specific plan, the SQL's execution can be done in 1 second.

I must say oracle is totally right.
But as you can see now, our problem is not:
" by using this specific plan, the SQL's execution can be done in 1 second."

Our problem is oracle spend 60 seconds to find out that best plan.


I changed those parameters:
alter session set "_optimizer_cost_based_transformation"=off;
alter session set "_b_tree_bitmap_plans"=false;
alter session set "optimizer_index_cost_adj" = 50;

It doesn't mean I was not statistic with previous plan oracle found and I want to change them.
For example,
Before I set "_b_tree_bitmap_plans"=false, oracle will consider these execution plans for a SQL:
1. table1 table2: no bitmap
2. table2 table1: no bitmap
3. table1 table2: bitmap
4. table2 table1: bitmap

and oracle finally picks:
1. table1 table2: no bitmap

Then I set "_b_tree_bitmap_plans"=false;
That will help oracle to know that it doesn't need to consider bitmap execution plans, so oracle will consider only below two plans:
1. table1 table2: no bitmap
2. table2 table1: no bitmap

And still at last oracle picks this plan:
1. table1 table2: no bitmap

We can see oracle picks up a same plan at last, but due to the fact oracle has less to consider, the time oracle costs at PRASE is reduced.


So, the purpose that I changed all the parameters was to limit oracle's choices.

As a result, still oracle may pick up a same execution plan, or a different one, but again, that is irrelevant as step EXECUTION was not the bottleneck in our case and not our concern.

7 Comments:

Anonymous said...

great post. this helped and saved lot of time. thank you so much to write such a post in great details.

sap fico online training said...

Great article,nice to learn

Anonymous said...

Thank you very much. You saved the day!

Unknown said...

Thank you so much. You helped me a lot.

Unknown said...
This comment has been removed by the author.
Anonymous said...

great post thanks

Anonymous said...

there is a bug on the metalink site for this issue (long parse times) Doc ID 9659125.8
it says the fix is in 12.1.0.1

Post a Comment