Not long time ago, one of our new production database hits very serious performance issue after migration from 10g to 11g.
It appreas like: All SQLs execution plan not changed, only the execution time become ten times or more.
After go through AWR report it looks clear that: The IO system for the DB is overload.
The IO load reach to 400mb/s , and cost nearly 400ms per read. The normal value should around 5ms-10ms.
Go through the AWR report, and find below TOP event cost 85%+ DB time:
direct path read.
Direct path read is an new feature in 11G. In 11G, when a table is doing a full table scan, oracle will make a judge and very possiblely choose direct path read instead of traditional db file scattered read.
In our enviroment, there is an simple SQL which don't have where filter, so definitly oracle has to choose full table scan for it. And oracle choosed direct path read.
This simple SQL is executed frequently, at the busiest time, more then 100+ sesssions run that SQL at same time.
Direct path read is not a problem, but when 100+ session doing direct path read at same time, then it is a big problem.
Because direct path read will skip the memory, hence make the IO load higher and higher and slower and slower.
And all other no-related SQLs which not doing a full table scan, when they need to read block from disk, will be also very slow because the IO system already nearly crashed.
Below is part of AWR report, for refer:
Wait | Event | Wait Time | Summary Avg Wait Time (ms) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
I# | Class | Event | Waits | %Timeouts | Total(s) | Avg(ms) | %DB time | Avg | Min | Max | Std Dev | Cnt |
* | User I/O | direct path read | 1,373,931 | 0.00 | 481,162.28 | 350.21 | 81.58 | 350.26 | 348.14 | 353.65 | 2.97 | 3 |
Other | reliable message | 32,847 | 21.54 | 9,074.56 | 276.27 | 1.54 | 369.12 | 152.35 | 495.44 | 188.58 | 3 | |
User I/O | db file sequential read | 21,558 | 0.00 | 8,367.24 | 388.13 | 1.42 | 387.98 | 379.14 | 394.27 | 7.88 | 3 | |
System I/O | control file sequential read | 26,328 | 0.00 | 5,801.76 | 220.36 | 0.98 | 218.09 | 211.52 | 228.97 | 9.49 | 3 | |
Commit | log file sync | 13,094 | 0.00 | 5,346.94 | 408.35 | 0.91 | 447.49 | 251.90 | 658.57 | 203.78 | 3 | |
DB CPU | 4,250.92 | 0.72 | 3 | |||||||||
User I/O | direct path write temp | 12,742 | 0.00 | 3,505.86 | 275.14 | 0.59 | 399.88 | 124.09 | 539.61 | 238.85 | 3 | |
System I/O | log file parallel write | 13,040 | 0.00 | 2,580.21 | 197.87 | 0.44 | 209.98 | 142.42 | 304.40 | 84.26 | 3 | |
System I/O | db file parallel write | 3,896 | 0.00 | 800.70 | 205.52 | 0.14 | 208.68 | 172.50 | 256.24 | 43.01 | 3 | |
Other | gcs log flush sync | 70,782 | 86.16 | 691.43 | 9.77 | 0.12 | 9.81 | 9.51 | 9.99 | 0.26 | 3 |
From above we can find that all IO related operation turns into tragedy. Even log file sync, controlfile sequential read these no-related event, was dragged by the poor IO caused by direct path read.
More......