February 5, 2012

serious performance issue caused by 11g new feature

Not long time ago, one of our new production database hits very serious performance issue after migration from 10g to 11g.

It appreas like: All SQLs execution plan not changed, only the execution time become ten times or more.

After go through AWR report it looks clear that: The IO system for the DB is overload.

The IO load reach to 400mb/s ,   and cost nearly 400ms per read.  The normal value should around 5ms-10ms.
Go through the AWR report, and find below TOP event cost 85%+ DB time:
direct path read.

Direct path read is an new feature in 11G.   In 11G, when a table is doing a full table scan,  oracle will make a judge and very possiblely choose direct path read instead of traditional db file scattered read.

In our enviroment, there is an simple SQL which don't have where filter,  so definitly oracle has to choose full table scan for it.  And oracle choosed direct path read.

This simple SQL is executed frequently, at the busiest time, more then 100+ sesssions run that SQL at same time.

Direct path read is not a problem, but when 100+ session doing direct path read at same time, then it is a big problem.
Because direct path read will skip the memory, hence make the IO load higher and higher and slower and slower.

And all other no-related SQLs which not doing a full table scan, when they need to read block from disk, will be also very slow because the IO system already nearly crashed.

Below is part of AWR report, for refer:


WaitEventWait TimeSummary Avg Wait Time (ms)
I#ClassEventWaits%TimeoutsTotal(s)Avg(ms)%DB timeAvgMinMaxStd DevCnt
*User I/Odirect path read1,373,9310.00481,162.28350.2181.58350.26348.14353.652.973
Otherreliable message32,84721.549,074.56276.271.54369.12152.35495.44188.583
User I/Odb file sequential read21,5580.008,367.24388.131.42387.98379.14394.277.883
System I/Ocontrol file sequential read26,3280.005,801.76220.360.98218.09211.52228.979.493
Commitlog file sync13,0940.005,346.94408.350.91447.49251.90658.57203.783
DB CPU4,250.920.723
User I/Odirect path write temp12,7420.003,505.86275.140.59399.88124.09539.61238.853
System I/Olog file parallel write13,0400.002,580.21197.870.44209.98142.42304.4084.263
System I/Odb file parallel write3,8960.00800.70205.520.14208.68172.50256.2443.013
Othergcs log flush sync70,78286.16691.439.770.129.819.519.990.263


From above we can find that all IO related operation turns into tragedy.  Even log file sync,  controlfile sequential read these no-related event, was dragged by the poor IO caused by direct path read.

4 Comments:

Anonymous said...

and what you suggest?
some where filters? indexes? disable direct read path? ;)

regards

Kane Zhang said...

If the SQL can be added with a where filter to avoid full table scan, that is the best way.

But sometimes the application itself's logic require a Full Table Scan, no where filter optioanal.

Then the only way is to disable direct path read.
Two way to disable direct path read:
1. set event 10949 trace
2. cache the table.

In our enviroment, we choose the second. After cached the table, oracle will be very smart to know that it should read the table from memory instead of read from disk.

Anonymous said...

Not a strong argument Mr. Zhang. How many tables can you cache in memory? If you have 100 tables with millions of rows that performance have been affected by your scenario, will you cache all the tables?

Kane Zhang said...

Hi,

It depends.
You can cache the table if the issue only occur on only a few small tables.
Or you can disable on global level with 10949 event.

Kevin

Post a Comment