Showing posts with label errorstack. Show all posts
Showing posts with label errorstack. Show all posts

April 8, 2016

orphan entries in col$ (due to creating online index failure) make gather stats job fail with ORA-01847

The gather stats job always fail on one of our core table with below error:

EXEC dbms_stats.gather_table_stats(ownname=> 'SYSADM', 
tabname=>'gbprestatiegroep', method_opt=>'For all columns size 1',estimate_percent=> 100, 
cascade=>false, degree=>1, BLOCK_SAMPLE=>false, NO_INVALIDATE=>false ); 
Error report: 
ORA-01847: day of month must be between 1 and last day of month 
ORA-06512: at "SYS.DBMS_STATS", line 23829 
ORA-06512: at "SYS.DBMS_STATS", line 23880


With "cascade=>false" in above command, we define to only gather table data stats.
Then, we treid to gather index only stats for the indexes on this table and they all succeeded:

gather_index_stats(OWNNAME=> 'SYSADM', INDNAME=> 'IX_GBP_PATTRACKING',  GRANUL
gather_index_stats(OWNNAME=> 'SYSADM', INDNAME=> 'DPG17',  GRANULARITY=>'ALL',
gather_index_stats(OWNNAME=> 'SYSADM', INDNAME=> 'DPG2',  GRANULARITY=>'ALL',e
gather_index_stats(OWNNAME=> 'SYSADM', INDNAME=> 'DPG14',  GRANULARITY=>'ALL',
gather_index_stats(OWNNAME=> 'SYSADM', INDNAME=> 'CGBP01',  GRANULARITY=>'ALL'
gather_index_stats(OWNNAME=> 'SYSADM', INDNAME=> 'CIND_GBP_PK_S_KEY2',  GRANUL
............


Above result seems to indicate data (corruption/invalidation/broken) issue on the table while the data in index is fine.
However, below command can return result correctly without any issue:
select /*+full(table)*/ * from table;
select /*+full(table)*/ to_char(column1,'yyyy-mm-dd hh24:mi:ss'), to_char(column2,'yyyy-mm-dd hh24:mi:ss'), ..... from table;


To investigate, I enabled level 3 errorstack, and I captured below issue SQL:

select /*+  no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad  */
count(*),
count("SYS_NC00074$"),
count("SYS_NC00075$"),
count("SYS_NC00076$"),
count("SYS_NC00077$"),
count("SYS_NC00078$"),
count("SYS_NC00079$"),
sum(sys_op_opnsize("SLEUTEL")),
sum(sys_op_opnsize("CLIENTAPPLIC")),
sum(sys_op_opnsize("DEMONSTRATION_FU1")),
count("DEMONSTRATION_FU2"),
sum(sys_op_opnsize("DEMONSTRATION_FU2")),
count("CONTACTTEMPLATE"),
sum(sys_op_opnsize("CONTACTTEMPLATE")),
count("MEDICALVALIDATION"),
sum(sys_op_opnsize("AANVRAGER")),
...................
...................
from "SYSADM"."GBPRESTATIEGROEP" t ;


What the hell are these columns? They do not show up if we use "DESC GBPRESTATIEGROEP" command:
count("SYS_NC00074$"),
count("SYS_NC00075$"),
count("SYS_NC00076$"),
count("SYS_NC00077$"),
count("SYS_NC00078$"),
count("SYS_NC00079$"),


After verifying, we confirmed that it is below columns that lead to the issue, and we can reproduce by executing.

SQL> set lines 300
SQL> set pagesize 500
SQL> select
2  count("SYS_NC00074$"),
3  count("SYS_NC00075$"),
4  count("SYS_NC00076$"),
5  count("SYS_NC00077$"),
6  count("SYS_NC00078$"),
7  count("SYS_NC00079$")
8   from "SYSADM"."GBPRESTATIEGROEP" t ;
count("SYS_NC00074$"),
*
ERROR at line 2:
ORA-01847: day of month must be between 1 and last day of month

Above SYS_NC0007* columns does not exist in DBA_INDEXES/DBA_OBJECTS/DBA_COLUMNS.

However, in col$, I find below:

SQL> select OBJ#,COL#,SEGCOL#,SEGCOLLENGTH,NAME,TYPE#,NULL$,DEFLENGTH,DEFAULT$,INTCOL#,PROPERTY  
  2  from col$ where name in ('SYS_NC00074$','SYS_NC00075$','SYS_NC00076$','SYS_NC00077$','SYS_NC00078$','SYS_NC00079$') and obj#=15182;
      OBJ#       COL#    SEGCOL# SEGCOLLENGTH NAME                                TYPE#      NULL$  DEFLENGTH DEFAULT$                                                                            INTCOL#   PROPERTY
---------- ---------- ---------- ------------ ------------------------------ ---------- ---------- ---------- -------------------------------------------------------------------------------- ---------- ----------
     15182          0          0            7 SYS_NC00074$                          180          0         81 TO_TIMESTAMP(TO_CHAR("PLANUITVOERING",'DD-MON-RR'),'yyyy-mm-dd hh24:mi:sssss.ff'         74     327976
     15182          0          0            7 SYS_NC00075$                          180          0         75 TO_TIMESTAMP(TO_CHAR("PLANUITVOERING",'DD-MON-RR'),'yyyy-mm-dd hh24:mi:ss')              75     327976
     15182          0          0            7 SYS_NC00076$                          180          0         78 TO_TIMESTAMP(TO_CHAR("PLANUITVOERING",'DD-MON-RR'),'yyyy-mm-dd hh24:mi:sssss')           76     327976
     15182          0          0            7 SYS_NC00077$                          180          0         75 TO_TIMESTAMP(TO_CHAR("PLANUITVOERING",'DD-MON-RR'),'yyyy-mm-dd hh24:mi:ss')              77     327976
     15182          0          0            7 SYS_NC00078$                          180          0         75 TO_TIMESTAMP(TO_CHAR("PLANUITVOERING",'DD-MON-RR'),'yyyy-mm-dd hh24:mi:ss')              78     327976
     15182          0          0            7 SYS_NC00079$                          180          0         75 TO_TIMESTAMP(TO_CHAR("PLANUITVOERING",'DD-MON-RR'),'yyyy-MM-dd hh24:mi:ss')              79     327976

6 rows selected. 

For last a few years, constant performance turning has always being on-going for this site.
And during the process, many indexes were created based on poor SQL found or suggested by sql advisor.

It might be that when creating above function indexes, the creating function index command failed, and for some reason, it left behind orphan entries in col$.

After testing, I managed to reproduce the bug in 11.2.0.3.10, while it does not exist in 11.2.0.4 though.



SQL> create table SYSADM.TESTORPHAN as select created from DBA_OBJECTS where 1=2;
Table created.

SQL> insert into SYSADM.TESTORPHAN values (to_date('2000-07-27 14:47:30','YYYY-MM-DD HH24:MI:SS'));
1 row created.

SQL> commit;
Commit complete.

SQL> create index SYSADM.INT_ERROR on SYSADM.TESTORPHAN(TO_TIMESTAMP(TO_CHAR("CREATED",'DD-MON-RR'),'yyyy-MM-dd hh24:mi:ss')) online;
create index SYSADM.INT_ERROR on SYSADM.TESTORPHAN(TO_TIMESTAMP(TO_CHAR("CREATED",'DD-MON-RR'),'yyyy-MM-dd hh24:mi:ss')) online
ERROR at line 1:
ORA-01847: day of month must be between 1 and last day of month

---------here it will leaves orphan entries in col$ table-------------


This is an obvious defect in online index (re)build cleanup function.
I checked below tables and I cannot find any entries for above columns there:
IND$
ICOL$
CON$
SEG$
CCOL$
icol$
icoldep$


I believe we can fix the issue by executing below SQLs:

delete from col$ where obj#=15182 and name in ('SYS_NC00074$','SYS_NC00075$','SYS_NC00076$','SYS_NC00077$','SYS_NC00078$','SYS_NC00079$');
update tab$ set INTCOLS=73, AVGRLN=228 where obj#=15182;
update obj$ set SPARE2=1 where obj#=15182;
commit;
shutdown abort
startup


As this is very critical environment and a huge site, I found an outage time to implement the fix by first enabling flashback on. The fix worked as I expected, the gather table stats job then completed successfully for this table.
Cheers.

More......

December 2, 2012

solve ORA-1658 VIA errorstack

Sometimes a issue itself is not complicated, but it is hard due to other reason, such as lack of neccesary information.

For example, one day user come to us and complain that their application job got below error:
F|Nov 29 20:47:34|UNXSERVER|E|27451|GXXS01|conv|ABORTED-SQLCODE -1658 AT 1223 ErrText = |
F|Nov 29 20:47:34|UNXSERVER|E|27451|GXXS01|conv|ENDING DUE TO SERIOUS SQL ERROR|
F|Nov 29 20:47:34|UNXSERVER|E|27462|GXXS02|conv|ABORTED-SQLCODE -1658 AT 1223 ErrText = |
F|Nov 29 20:47:34|UNXSERVER|E|27462|GXXS02|conv|ENDING DUE TO SERIOUS SQL ERROR|


All they can offer us is above error message and a DB name.
Then how should a DBA look into it?

Above error is not a standard DB error message which should begin with: "ORA-".
But from "SQLCODE -1658", we know it should be a DB error, it is not in a standard format due to their application catch the error message and throw it out with its own exception defination.

Let's see what is 1658:
server003[oracle]_db058> oerr ora 1658
01658, 00000, "unable to create INITIAL extent for segment in tablespace %s"// *Cause: Failed to find sufficient contiguous space to allocate INITIAL
// extent for segment being created.
// *Action: Use ALTER TABLESPACE ADD DATAFILE to add additional space to the
// tablespace or retry with a smaller value for INITIAL


OK, seems it is a simple tablespace free size issue.
But after checking in the DB, found there is not error message in alert.log.
The database have 30+ samiliar very tiny tablespaces.
So what tablespace caused the issue:

Tablespace                    Free MB   Total MB        PCT
------------------------- ----------- ---------- ----------
ICDM_20140801                       5         10         50
ICDM_20130901                       5         10         50
ICDM_20131201                       5         10         50
ICDM_20130501                       5         10         50
ICDM_20130701                       5         10         50
ICDM_20140101                       5         10         50
ICDM_20130801                       5         10         50
ICDM_20131001                       5         10         50
ICDM_20130301                       5         10         50
ICDM_20131101                       5         10         50
ICDM_20140201                       5         10         50
ICDM_20140501                       5         10         50
ICDM_20140601                       5         10         50
ICDM_20130401                       5         10         50
ICDM_20140301                       5         10         50
ICDM_20130601                       5         10         50
ICDM_20140401                       5         10         50
ICDM_20140701                       5         10         50
.............

Still user doesn't know anything, they got the error when running a job. But they don't know what is the current SQL and which table got the error on which tablespace.

Then i enabled the errorstack:
alter system set events '1658 trace name errorstack level 1';
Through above command i require to gernerate errorstack for error id ORA-1658 at DB level.

Then i ask user to re-run their job again.
Of course it fails again with error ORA-1658, but with the help of errorstack, this time we get the detail in alert.log:
Errors in file /server/backup/oracle/dump/getsstm2/diag/rdbms/db058/db058/trace/db058_ora_13006.trc:
ORA-01658: unable to create INITIAL extent for segment in tablespace FILE_20130101


OK, now we got the tablespace name: FILE_20130101.
After adding 4gb to FILE_20130101, the issue solved.

More......