How much can SEQUENCE drag down speed of SQL?
10%? 20%? Below my working case show you SEQUENCE drag down 600% speed of one SQL.
It is during my recently shooting of a job's performance issue, i notice this SQL in the job block.
This is quite a simple SQL, two tables join, only thing secial is that it is using a sequence GGGC_SSBA.SEQ_MIM_CCCC_PPPP_SSS.NEXTVAL:
dwxxxx2 > SELECT /*+ USE_HASH(S1 TM)*/ GGGC_SSBA.SEQ_MIM_CCCC_PPPP_SSS.NEXTVAL ,330404
,TO_DATE('2012-12-18 06:45:18', 'YYYY-MM-DD HH24:MI:SS')
,TO_DATE('2012-12-18 06:45:18', 'YYYY-MM-DD HH24:MI:SS')
,TO_DATE('2100-12-31', 'YYYY-MM-DD') ,S1.MATNR AS MATNR ,S1.VKORG AS
VKORG ,S1.VTWEG AS VTWEG ,S1.KONDM AS KONDM ,S1.DWERK AS DWERK
,S1.VMSTA AS VMSTA ,S1.VMSTD AS VMSTD ,S1.LVORM AS LVORM
,S1.DT_TM_XTR AS DT_TM_XTR ,S1.CDHDR_UDATE AS CDHDR_UDATE
,S1.PROVG AS PROVG FROM GEPC_TEMP_TABLE.KKK_MIM_XXXX_RRRR_SSS TM
GEPC_SOURCE.EEEIE_ZZZZZ_MMML_SLS S1 WHERE TM.SOURCE_ROWID = S1.ROWID;
643418 rows selected.
Elapsed: 00:01:00.21
Execution Plan
----------------------------------------------------------
Plan hash value: 3276368232
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 36837 | 2590K| 759 (3)| 00:00:14|
| 1 | SEQUENCE | SEQ_MIM_CCCC_PPPP_SSS | | | | |
|* 2 | HASH JOIN | | 36837 | 2590K| 759 (3)| 00:00:14|
| 3 | TABLE ACCESS FULL| KKK_MIM_XXXX_RRRR_SSS | 36837 | 359K| 29 (0)| 00:00:01|
| 4 | TABLE ACCESS FULL| EEEIE_ZZZZZ_MMML_SLS | 643K| 38M| 724 (3)| 00:00:14|
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("TM"."SOURCE_ROWID"="S1".ROWID)
Statistics
----------------------------------------------------------
128804 recursive calls
96793 db block gets
77309 consistent gets
661 physical reads
21141944 redo size
25998756 bytes sent via SQL*Net to client
472358 bytes received via SQL*Net from client
42896 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
643418 rows processed
From above red part we see the SQL esclape time is one minute.
That is too long after considerring the query's base tables data volumn, it should not take that much time.
We also notice that big "recursive calls" and "db block gets" Statistics.
It is very clear that they are caused by the sequence in the SQL.
So let's remove the sequence from the query and check again:
dwxxxx2 > SELECT /*+ USE_HASH(S1 TM)*/ 330404
,TO_DATE('2012-12-18 06:45:18', 'YYYY-MM-DD HH24:MI:SS')
,TO_DATE('2012-12-18 06:45:18', 'YYYY-MM-DD HH24:MI:SS')
,TO_DATE('2100-12-31', 'YYYY-MM-DD') ,S1.MATNR AS MATNR ,S1.VKORG AS
VKORG ,S1.VTWEG AS VTWEG ,S1.KONDM AS KONDM ,S1.DWERK AS DWERK
,S1.VMSTA AS VMSTA ,S1.VMSTD AS VMSTD ,S1.LVORM AS LVORM
,S1.DT_TM_XTR AS DT_TM_XTR ,S1.CDHDR_UDATE AS CDHDR_UDATE
,S1.PROVG AS PROVG FROM GEPC_TEMP_TABLE.KKK_MIM_XXXX_RRRR_SSS TM
GEPC_SOURCE.EEEIE_ZZZZZ_MMML_SLS S1 WHERE TM.SOURCE_ROWID = S1.ROWID;
643418 rows selected.
Elapsed: 00:00:10.23
Execution Plan
----------------------------------------------------------
Plan hash value: 2074598172
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 36837 | 2590K| 759 (3)| 00:00:14|
|* 1 | HASH JOIN | | 36837 | 2590K| 759 (3)| 00:00:14|
| 2 | TABLE ACCESS FULL| KKK_MIM_XXXX_RRRR_SSS | 36837 | 359K| 29 (0)| 00:00:01|
| 3 | TABLE ACCESS FULL| EEEIE_ZZZZZ_MMML_SLS | 643K| 38M| 724 (3)| 00:00:14|
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TM"."SOURCE_ROWID"="S1".ROWID)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
44998 consistent gets
0 physical reads
0 redo size
21501260 bytes sent via SQL*Net to client
472358 bytes received via SQL*Net from client
42896 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
643418 rows processed
Without the sequence the job completes in 10 seconds.
In another word, the sequence drag down 600% speed of this SQL.
Let's check the sequence attribute:
dwxxxx2 >select * from dba_sequences where SEQUENCE_NAME='SEQ_MIM_CCCC_PPPP_SSS';
SEQUENCE_OWNER SEQUENCE_NAME INCREMENT_BY C O CACHE_SIZE LAST_NUMBER
-------------- --------------------- ------------ - - ---------- -----------
GGGC_SSBA SEQ_MIM_CCCC_PPPP_SSS 1 N N 20 109594869
The CACHE_SIZE is 20.
20 is a default value, it is not suitable in our case.
I increase the value to 500 and re-execute the SQL.
This time:
dwxxxx2 > SELECT /*+ USE_HASH(S1 TM)*/ GGGC_SSBA.SEQ_MIM_CCCC_PPPP_SSS.NEXTVAL ,330404
,TO_DATE('2012-12-18 06:45:18', 'YYYY-MM-DD HH24:MI:SS')
,TO_DATE('2012-12-18 06:45:18', 'YYYY-MM-DD HH24:MI:SS')
,TO_DATE('2100-12-31', 'YYYY-MM-DD') ,S1.MATNR AS MATNR ,S1.VKORG AS
VKORG ,S1.VTWEG AS VTWEG ,S1.KONDM AS KONDM ,S1.DWERK AS DWERK
,S1.VMSTA AS VMSTA ,S1.VMSTD AS VMSTD ,S1.LVORM AS LVORM
,S1.DT_TM_XTR AS DT_TM_XTR ,S1.CDHDR_UDATE AS CDHDR_UDATE
,S1.PROVG AS PROVG FROM GEPC_TEMP_TABLE.KKK_MIM_XXXX_RRRR_SSS TM
GEPC_SOURCE.EEEIE_ZZZZZ_MMML_SLS S1 WHERE TM.SOURCE_ROWID = S1.ROWID;
643418 rows selected.
Elapsed: 00:00:29.60
Execution Plan
----------------------------------------------------------
Plan hash value: 3276368232
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 36837 | 2590K| 759 (3)| 00:00:14|
| 1 | SEQUENCE | SEQ_MIM_CCCC_PPPP_SSS | | | | |
|* 2 | HASH JOIN | | 36837 | 2590K| 759 (3)| 00:00:14|
| 3 | TABLE ACCESS FULL| KKK_MIM_XXXX_RRRR_SSS | 36837 | 359K| 29 (0)| 00:00:01|
| 4 | TABLE ACCESS FULL| EEEIE_ZZZZZ_MMML_SLS | 643K| 38M| 724 (3)| 00:00:14|
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("TM"."SOURCE_ROWID"="S1".ROWID)
Statistics
----------------------------------------------------------
5153 recursive calls
3870 db block gets
46287 consistent gets
0 physical reads
844460 redo size
25998756 bytes sent via SQL*Net to client
472358 bytes received via SQL*Net from client
42896 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
643418 rows processed
SQL completes in 30 seconds. Also "db block gets" and "recursive calls" is much smaller then first serneriao--cache is 20.
30 seconds means still the sequence drag down 3 times of the sql's speed.
It is still not reasonable.
The DB have mutiple samiliar queries, I don't have enough free time to look deep into this. I may raise a ticket with oracle support to check this once i get free time.
More......
GE, AGFA. experienced in 9i/10g/11g/12c, AdvanceReplication, Stream, GoldenGate, RAC, DataGuard, ODA, EXADATA, GridControl. In my career I have always been assigned to handle most complicated and critical oracle issues without solution online, and I hope to record some of them via this blog.
December 20, 2012
Performance Shooting: How much can SEQUENCE slow down a SQL
December 7, 2012
MMON Slave Process Reports ORA-7445 [kglic0] Error, Plus Database Hangs
A DB suddenly hang during a load. User come to me.
When i login the server, i can see the issue DB consuming 100% CPU on the server.
Tried "sqlplus / as sysdba" but also hang.
I gernerate a systemstate dump and notice that the PMON is blocked:
PROCESS 2: PMON
O/S info: user: oracle, term: UNKNOWN, ospid: 1193
OSD pid info: Unix process pid: 1193, image: oracle@alpcisddb484.corporate.ge.com (PMON)
waiting for 60108b48 Child shared pool level=7 child#=3
Location from where latch is held: kgh.h LINE:6387 ID:kghalo:
Context saved from call: 0
state=busy [holder orapid=89] wlstate=free [value=0]
waiters [orapid (seconds since: put on list, posted, alive check)]:
possible holder pid = 89 ospid=10478
Let's check which session is holding latch 60108b48, and what is the session waiting for:
PROCESS 89:
O/S info: user: oracle, term: UNKNOWN, ospid: 10478
OSD pid info: Unix process pid: 10478, image: oracle@alpcisddb484.corporate.ge.com
holding (efd=8) 60108b48 Child shared pool level=7 child#=3
Current Wait Stack:
0: waiting for 'library cache: mutex X'
idn=0xfc57b2b7, value=0xd100000000, where=0x4f
wait_id=57 seq_num=60 snap_id=1
wait times: snap=45 min 11 sec, exc=45 min 11 sec, total=45 min 11 sec
wait times: max=infinite, heur=45 min 11 sec
wait counts: calls=0 os=0
in_wait=1 iflags=0x5a2
Let’s see which session is holding mutex 0xfc57b2b7 and what is the session waiting for:
PROCESS 42: M000
SO: 0x19f40dda8, type: 4, owner: 0x19f2de3f8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x19f2de3f8, name=session, file=ksu.h LINE:12459, pg=0
(session) sid: 209 ser: 521 trans: (nil), creator: 0x19f2de3f8
Current Wait Stack:
Not in wait; last wait ended 45 min 10 sec ago
OK, till now we found the root blocking is M000.
It is holding mutex:
PROCESS 42: M000
SO: 0x19f40dda8, type: 4, owner: 0x19f2de3f8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x19f2de3f8, name=session, file=ksu.h LINE:12459, pg=0
(session) sid: 209 ser: 521 trans: (nil), creator: 0x19f2de3f8
Current Wait Stack:
Not in wait; last wait ended 45 min 10 sec ago
KGL-UOL SO Cache(total=240, free=0)
KGX Atomic Operation Log 0x17d569fc0
Mutex 0x19eb2b4c8(209, 0) idn 1fc57b2b7 oper EXCL
Library Cache uid 209 efd 7 whr 49 slp 0
oper=0 pt1=(nil) pt2=(nil) pt3=(nil)
But above also shows that M000 itself not in a wait since 45 mins back. What is M000 doing and why it doesn't release that mutex all along?
It is un-reasonable.
After review the process state of M000, i notice the process is already dead:
SO: 0x19f2de3f8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x19f2de3f8, name=process, file=ksu.h LINE:12451, pg=0
(process) Oracle pid:42, ser:16, calls cur/top: 0x1908c1eb8/0x1906a28f8
flags : (0x3) DEAD
flags2: (0x8030), flags3: (0x0)
intr error: 0, call error: 0, sess error: 0, txn error 0
intr queue: empty
ksudlp FALSE at location: 0
Cleanup details:
Found dead = 44 min 27 sec ago
Total Cleanup attempts = 10, Cleanup time = 15 min 49 sec, Cleanup timer = 15 min 45 sec
Last attempt (full) ended 54 sec ago, Length = 2 min 30 sec, Cleanup timer = 2 min 30 sec
OK. Now it is clear to us that M000's death leading to the DB hang.
But why M000 suddenly dead?
After checking alert.log i find blow:
Thu Dec 06 22:00:48 2012
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x1060] [PC:0x90ED147, kglic0()+1075] [flags: 0x0, count: 1]
Errors in file /x484/dump01/oracle/hrxt/diag/rdbms/hrxdt/hrxdt_m000_10618.trc (incident=6337):
ORA-07445: exception encountered: core dump [kglic0()+1075] [SIGSEGV] [ADDR:0x1060] [PC:0x90ED147] [Address not mapped to object] []
Now all are clear.
After searching on metalink with above symbol, our scenario excatly matches this bug:
MMON Slave Process Reports ORA-7445 [kglic0] Error, Plus Database Hangs [ID 1487720.1]
More......
December 2, 2012
solve ORA-1658 VIA errorstack
Sometimes a issue itself is not complicated, but it is hard due to other reason, such as lack of neccesary information.
For example, one day user come to us and complain that their application job got below error:
F|Nov 29 20:47:34|UNXSERVER|E|27451|GXXS01|conv|ABORTED-SQLCODE -1658 AT 1223 ErrText = |
F|Nov 29 20:47:34|UNXSERVER|E|27451|GXXS01|conv|ENDING DUE TO SERIOUS SQL ERROR|
F|Nov 29 20:47:34|UNXSERVER|E|27462|GXXS02|conv|ABORTED-SQLCODE -1658 AT 1223 ErrText = |
F|Nov 29 20:47:34|UNXSERVER|E|27462|GXXS02|conv|ENDING DUE TO SERIOUS SQL ERROR|
All they can offer us is above error message and a DB name.
Then how should a DBA look into it?
Above error is not a standard DB error message which should begin with: "ORA-".
But from "SQLCODE -1658", we know it should be a DB error, it is not in a standard format due to their application catch the error message and throw it out with its own exception defination.
Let's see what is 1658:
server003[oracle]_db058> oerr ora 1658
01658, 00000, "unable to create INITIAL extent for segment in tablespace %s"// *Cause: Failed to find sufficient contiguous space to allocate INITIAL
// extent for segment being created.
// *Action: Use ALTER TABLESPACE ADD DATAFILE to add additional space to the
// tablespace or retry with a smaller value for INITIAL
OK, seems it is a simple tablespace free size issue.
But after checking in the DB, found there is not error message in alert.log.
The database have 30+ samiliar very tiny tablespaces.
So what tablespace caused the issue:Tablespace Free MB Total MB PCT
------------------------- ----------- ---------- ----------
ICDM_20140801 5 10 50
ICDM_20130901 5 10 50
ICDM_20131201 5 10 50
ICDM_20130501 5 10 50
ICDM_20130701 5 10 50
ICDM_20140101 5 10 50
ICDM_20130801 5 10 50
ICDM_20131001 5 10 50
ICDM_20130301 5 10 50
ICDM_20131101 5 10 50
ICDM_20140201 5 10 50
ICDM_20140501 5 10 50
ICDM_20140601 5 10 50
ICDM_20130401 5 10 50
ICDM_20140301 5 10 50
ICDM_20130601 5 10 50
ICDM_20140401 5 10 50
ICDM_20140701 5 10 50
.............
Still user doesn't know anything, they got the error when running a job. But they don't know what is the current SQL and which table got the error on which tablespace.
Then i enabled the errorstack:
alter system set events '1658 trace name errorstack level 1';
Through above command i require to gernerate errorstack for error id ORA-1658 at DB level.
Then i ask user to re-run their job again.
Of course it fails again with error ORA-1658, but with the help of errorstack, this time we get the detail in alert.log:
Errors in file /server/backup/oracle/dump/getsstm2/diag/rdbms/db058/db058/trace/db058_ora_13006.trc:
ORA-01658: unable to create INITIAL extent for segment in tablespace FILE_20130101
OK, now we got the tablespace name: FILE_20130101.
After adding 4gb to FILE_20130101, the issue solved.
More......