Showing posts with label internal. Show all posts
Showing posts with label internal. Show all posts

April 8, 2016

orphan entries in col$ (due to creating online index failure) make gather stats job fail with ORA-01847

The gather stats job always fail on one of our core table with below error:

EXEC dbms_stats.gather_table_stats(ownname=> 'SYSADM', 
tabname=>'gbprestatiegroep', method_opt=>'For all columns size 1',estimate_percent=> 100, 
cascade=>false, degree=>1, BLOCK_SAMPLE=>false, NO_INVALIDATE=>false ); 
Error report: 
ORA-01847: day of month must be between 1 and last day of month 
ORA-06512: at "SYS.DBMS_STATS", line 23829 
ORA-06512: at "SYS.DBMS_STATS", line 23880


With "cascade=>false" in above command, we define to only gather table data stats.
Then, we treid to gather index only stats for the indexes on this table and they all succeeded:

gather_index_stats(OWNNAME=> 'SYSADM', INDNAME=> 'IX_GBP_PATTRACKING',  GRANUL
gather_index_stats(OWNNAME=> 'SYSADM', INDNAME=> 'DPG17',  GRANULARITY=>'ALL',
gather_index_stats(OWNNAME=> 'SYSADM', INDNAME=> 'DPG2',  GRANULARITY=>'ALL',e
gather_index_stats(OWNNAME=> 'SYSADM', INDNAME=> 'DPG14',  GRANULARITY=>'ALL',
gather_index_stats(OWNNAME=> 'SYSADM', INDNAME=> 'CGBP01',  GRANULARITY=>'ALL'
gather_index_stats(OWNNAME=> 'SYSADM', INDNAME=> 'CIND_GBP_PK_S_KEY2',  GRANUL
............


Above result seems to indicate data (corruption/invalidation/broken) issue on the table while the data in index is fine.
However, below command can return result correctly without any issue:
select /*+full(table)*/ * from table;
select /*+full(table)*/ to_char(column1,'yyyy-mm-dd hh24:mi:ss'), to_char(column2,'yyyy-mm-dd hh24:mi:ss'), ..... from table;


To investigate, I enabled level 3 errorstack, and I captured below issue SQL:

select /*+  no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad  */
count(*),
count("SYS_NC00074$"),
count("SYS_NC00075$"),
count("SYS_NC00076$"),
count("SYS_NC00077$"),
count("SYS_NC00078$"),
count("SYS_NC00079$"),
sum(sys_op_opnsize("SLEUTEL")),
sum(sys_op_opnsize("CLIENTAPPLIC")),
sum(sys_op_opnsize("DEMONSTRATION_FU1")),
count("DEMONSTRATION_FU2"),
sum(sys_op_opnsize("DEMONSTRATION_FU2")),
count("CONTACTTEMPLATE"),
sum(sys_op_opnsize("CONTACTTEMPLATE")),
count("MEDICALVALIDATION"),
sum(sys_op_opnsize("AANVRAGER")),
...................
...................
from "SYSADM"."GBPRESTATIEGROEP" t ;


What the hell are these columns? They do not show up if we use "DESC GBPRESTATIEGROEP" command:
count("SYS_NC00074$"),
count("SYS_NC00075$"),
count("SYS_NC00076$"),
count("SYS_NC00077$"),
count("SYS_NC00078$"),
count("SYS_NC00079$"),


After verifying, we confirmed that it is below columns that lead to the issue, and we can reproduce by executing.

SQL> set lines 300
SQL> set pagesize 500
SQL> select
2  count("SYS_NC00074$"),
3  count("SYS_NC00075$"),
4  count("SYS_NC00076$"),
5  count("SYS_NC00077$"),
6  count("SYS_NC00078$"),
7  count("SYS_NC00079$")
8   from "SYSADM"."GBPRESTATIEGROEP" t ;
count("SYS_NC00074$"),
*
ERROR at line 2:
ORA-01847: day of month must be between 1 and last day of month

Above SYS_NC0007* columns does not exist in DBA_INDEXES/DBA_OBJECTS/DBA_COLUMNS.

However, in col$, I find below:

SQL> select OBJ#,COL#,SEGCOL#,SEGCOLLENGTH,NAME,TYPE#,NULL$,DEFLENGTH,DEFAULT$,INTCOL#,PROPERTY  
  2  from col$ where name in ('SYS_NC00074$','SYS_NC00075$','SYS_NC00076$','SYS_NC00077$','SYS_NC00078$','SYS_NC00079$') and obj#=15182;
      OBJ#       COL#    SEGCOL# SEGCOLLENGTH NAME                                TYPE#      NULL$  DEFLENGTH DEFAULT$                                                                            INTCOL#   PROPERTY
---------- ---------- ---------- ------------ ------------------------------ ---------- ---------- ---------- -------------------------------------------------------------------------------- ---------- ----------
     15182          0          0            7 SYS_NC00074$                          180          0         81 TO_TIMESTAMP(TO_CHAR("PLANUITVOERING",'DD-MON-RR'),'yyyy-mm-dd hh24:mi:sssss.ff'         74     327976
     15182          0          0            7 SYS_NC00075$                          180          0         75 TO_TIMESTAMP(TO_CHAR("PLANUITVOERING",'DD-MON-RR'),'yyyy-mm-dd hh24:mi:ss')              75     327976
     15182          0          0            7 SYS_NC00076$                          180          0         78 TO_TIMESTAMP(TO_CHAR("PLANUITVOERING",'DD-MON-RR'),'yyyy-mm-dd hh24:mi:sssss')           76     327976
     15182          0          0            7 SYS_NC00077$                          180          0         75 TO_TIMESTAMP(TO_CHAR("PLANUITVOERING",'DD-MON-RR'),'yyyy-mm-dd hh24:mi:ss')              77     327976
     15182          0          0            7 SYS_NC00078$                          180          0         75 TO_TIMESTAMP(TO_CHAR("PLANUITVOERING",'DD-MON-RR'),'yyyy-mm-dd hh24:mi:ss')              78     327976
     15182          0          0            7 SYS_NC00079$                          180          0         75 TO_TIMESTAMP(TO_CHAR("PLANUITVOERING",'DD-MON-RR'),'yyyy-MM-dd hh24:mi:ss')              79     327976

6 rows selected. 

For last a few years, constant performance turning has always being on-going for this site.
And during the process, many indexes were created based on poor SQL found or suggested by sql advisor.

It might be that when creating above function indexes, the creating function index command failed, and for some reason, it left behind orphan entries in col$.

After testing, I managed to reproduce the bug in 11.2.0.3.10, while it does not exist in 11.2.0.4 though.



SQL> create table SYSADM.TESTORPHAN as select created from DBA_OBJECTS where 1=2;
Table created.

SQL> insert into SYSADM.TESTORPHAN values (to_date('2000-07-27 14:47:30','YYYY-MM-DD HH24:MI:SS'));
1 row created.

SQL> commit;
Commit complete.

SQL> create index SYSADM.INT_ERROR on SYSADM.TESTORPHAN(TO_TIMESTAMP(TO_CHAR("CREATED",'DD-MON-RR'),'yyyy-MM-dd hh24:mi:ss')) online;
create index SYSADM.INT_ERROR on SYSADM.TESTORPHAN(TO_TIMESTAMP(TO_CHAR("CREATED",'DD-MON-RR'),'yyyy-MM-dd hh24:mi:ss')) online
ERROR at line 1:
ORA-01847: day of month must be between 1 and last day of month

---------here it will leaves orphan entries in col$ table-------------


This is an obvious defect in online index (re)build cleanup function.
I checked below tables and I cannot find any entries for above columns there:
IND$
ICOL$
CON$
SEG$
CCOL$
icol$
icoldep$


I believe we can fix the issue by executing below SQLs:

delete from col$ where obj#=15182 and name in ('SYS_NC00074$','SYS_NC00075$','SYS_NC00076$','SYS_NC00077$','SYS_NC00078$','SYS_NC00079$');
update tab$ set INTCOLS=73, AVGRLN=228 where obj#=15182;
update obj$ set SPARE2=1 where obj#=15182;
commit;
shutdown abort
startup


As this is very critical environment and a huge site, I found an outage time to implement the fix by first enabling flashback on. The fix worked as I expected, the gather table stats job then completed successfully for this table.
Cheers.

More......

June 12, 2012

Real meaning and Simulate "ORA-01152: file was not restored from a sufficiently old backup"

I saw many times people asked the meaning of "ORA-01152: file n was not restored from a sufficiently old backup" and the difference between "ORA-01152" and "ORA-01113: file needs media recovery".

First of all,  "ORA-01152" is not related to controlfile. I say so because someone said  "ORA-01152: file was not restored from a sufficiently old backup" is caused by old
controlile.
It is not true. A simple prove is: even you recreate controlfile with "create database", you can still get the error.

Then what is the real meaning for "ORA-01152: file was not restored from a sufficiently old backup"?

To figure it out, let's firstly learn below knowledge regarding how oracle know if a recovery is nessasary during startup.
Infact oracle make the judgement from below two flags in datafile header block:
1. OFFSET 0138-0139: DB_state.
00 means DB is down, 04 means DB is running.
When a DB is up and running, this flag is 04. During an normal shutdown, oracle will change this flag to 00.
So if oracle read 04 of the flag during startup, then oracle will be aware that latest shutdown was not commpleted succeesfully, so a recovery will be nessasary.

2. OFFSET 0484-0489: checkpoint_change#
During a DB clean shutdown, oracle will make a full checkpoint, so all datafiles' checkpoint_change# will be sync.
So if in a startup checking, oracle found checkpoint_change# are inconsistent among datafiles, then an recovery will be required.



OK. Now we know how oracle judge the recovery, from two flags DB_state and checkpoint_change#.
Then it is a simple math that there are four combinations:
Scenario 1.DB_state=00 and checkpoint_change# are consistent.
It means lastest shutdown is an succeed clean shutdown. Oracle won't try to perform any recovery.

Scenario 2.DB_state=04 and checkpoint_change# are inconsistent.
95% DB crashing is located in this senerio. When a DB crashed, oracle have no time to change DB_state to 00, and checkpoint_change# among datafiles are not consistent.
In this senerio, when DB startup, a recovery will be required to perform, if you try to open resetlogs without recover, you will get below error:
ORA-01113: file 5 needs media recovery
ORA-01110: data file 5: '/data/oracle/data/miao/miao/stream01.dbf'
This is an common error we often get. We know very well about it.

Scenario 3.DB_state=04 and checkpoint_change# are consistent.
How can this happen? For example, you execute command "alter system checkpoint;" after this command you immediately execute below command:
shutdown abort;
Then checkpoint_change# among datafiles are the same. But DB_state remians 04.

Next time when oracle startup, oracle will read DB_state=04, which means, as i already introduced, latest close for this datafile is not an normal close.
The error here you will get is the same in senerio 2:
ORA-01113: file 5 needs media recovery

Scenario 4.state_flag=00 and checkpoint_change# are inconsistent.
In this case, you will get error "ORA-01152: file n was not restored from a sufficiently old backup".

But how could this happen? How can it be possible when checkpoint_change# are inconsistent while state_flag=00 which means a clean shutdown?


For example:
At time point T1, you shutdown database and made a closed clean backup.
Then you startup DB and do some operations and swtich logfile for mutiple times. Then at time point T2, again you normally shutdown the DB.

Now you delete some T2 datafiles and replace it with datafiles that you backuped at timepoint T1.
So now all these datafiles state_flag=00 because all datafiles are from an clean closed shutdown, but checkpoint_change# among these datafiles are not consistent.

In this sernerio, when you try to open resetlogs DB without recovery, you will get below error:
ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1: '/data/oracle/data/miao/miao/system01.dbf'

More......