tag:blogger.com,1999:blog-52427637969328172712024-03-21T16:43:30.877-04:00Kane's BlogGE, AGFA. experienced in 9i/10g/11g/12c, AdvanceReplication, Stream, GoldenGate, RAC, DataGuard, ODA, EXADATA, GridControl.
In my career I have always been assigned to handle most complicated and critical oracle issues without solution online, and I hope to record some of them via this blog.
Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.comBlogger51125tag:blogger.com,1999:blog-5242763796932817271.post-75090101553522602932017-01-20T00:59:00.001-05:002018-10-18T05:49:17.160-04:00Solving a critical performance crisis for a national project caused by oracle connection slownessAn living national healthcare IT system, one core gateway system is very slow.
The gateway system is used to distribute patients' admission and treatment information. Every patient may generate dozens of system messages when they go to hospital, like when they report at the front desk, take X-rays, pay the bill, or when doctors write reports for their examinations etc.
The gateway system is Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com0tag:blogger.com,1999:blog-5242763796932817271.post-46646668387198309642016-04-08T04:53:00.000-04:002016-04-08T05:08:16.002-04:00orphan entries in col$ (due to creating online index failure) make gather stats job fail with ORA-01847The gather stats job always fail on one of our core table with below error:
EXEC dbms_stats.gather_table_stats(ownname=> 'SYSADM',
tabname=>'gbprestatiegroep', method_opt=>'For all columns size 1',estimate_percent=> 100,
cascade=>false, degree=>1, BLOCK_SAMPLE=>false, NO_INVALIDATE=>false );
Error report:
ORA-01847: day of month must be between 1 and last day of monthKane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com1tag:blogger.com,1999:blog-5242763796932817271.post-56992553743646785912014-06-10T23:13:00.000-04:002014-06-12T19:23:06.289-04:00long parsing time issue: bundles of SQLs cost 10+ seconds to do a hard parseOne database, after migrating to 11.2.0.3, customer started to complain the performance was much slower than before.
After checking, we saw for most SQLs, the execution speed was fast while the parsing time was too long.
I took one SQL as example, ran it multiple times to ensure there was no physical read:
no rows selected
Elapsed: 00:00:00.11
Statistics
-----------------------------------------Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com7tag:blogger.com,1999:blog-5242763796932817271.post-84367227865445020872014-05-28T04:18:00.000-04:002014-05-28T04:18:12.890-04:00database memory exhausted caused by vmware ballooning(vmmemctl.sys)One DB frequently crashed.
Here was the entry in alert.log:
Fri Oct 11 04:11:22 2013
Errors in file c:\oracle\product\10.2.0\admin\qprod\bdump\qprod_arc0_2380.trc:
ORA-00202: Message 202 not found; No message file for product=RDBMS, facility=ORA; arguments: [E:\ORACLE\ORADATA\QPROD\CONTROL02.CTL]
ORA-27091: Message 27091 not found; No message file for product=RDBMS, facility=ORA
ORA-27070: Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com1tag:blogger.com,1999:blog-5242763796932817271.post-85507514656992967622012-12-20T19:54:00.002-05:002012-12-21T00:30:48.954-05:00Performance Shooting: How much can SEQUENCE slow down a SQLHow much can SEQUENCE drag down speed of SQL?
10%? 20%? Below my working case show you SEQUENCE drag down 600% speed of one SQL.
It is during my recently shooting of a job's performance issue, i notice this SQL in the job block.
This is quite a simple SQL, two tables join, only thing secial is that it is using a sequence GGGC_SSBA.SEQ_MIM_CCCC_PPPP_SSS.NEXTVAL:
dwxxxx2 > SELECT /*+ USE_HASH(Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com0tag:blogger.com,1999:blog-5242763796932817271.post-66003825017182517102012-12-07T11:49:00.001-05:002012-12-07T11:49:35.828-05:00MMON Slave Process Reports ORA-7445 [kglic0] Error, Plus Database HangsA DB suddenly hang during a load. User come to me.
When i login the server, i can see the issue DB consuming 100% CPU on the server.
Tried "sqlplus / as sysdba" but also hang.
I gernerate a systemstate dump and notice that the PMON is blocked:
PROCESS 2: PMON
O/S info: user: oracle, term: UNKNOWN, ospid: 1193
OSD pid info: Unix process pid: 1193, image: oracle@alpcisddb484.Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com0tag:blogger.com,1999:blog-5242763796932817271.post-36306200527490159712012-12-02T05:55:00.000-05:002012-12-02T21:01:07.788-05:00solve ORA-1658 VIA errorstackSometimes a issue itself is not complicated, but it is hard due to other reason, such as lack of neccesary information.For example, one day user come to us and complain that their application job got below error:F|Nov 29 20:47:34|UNXSERVER|E|27451|GXXS01|conv|ABORTED-SQLCODE -1658 AT 1223 ErrText = |F|Nov 29 20:47:34|UNXSERVER|E|27451|GXXS01|conv|ENDING DUE TO SERIOUS SQL ERROR|F|Nov 29 20:47:34|Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com1tag:blogger.com,1999:blog-5242763796932817271.post-24003307781828918122012-11-27T04:14:00.002-05:002012-11-27T04:22:32.207-05:00LOCK diagnose: NOWAIT lock requests could hang in RAC (blocked in compatble mode)One 4 instances Rac DB report lock issue:
INST_ID SESSION ID1 ID2 LMODE REQUEST TY CTIME
---------- --------------- ---------- ---------- ---------- ---------- -- ----------
1 Holder: 5887 3 1 3 0 TS 2665025
1 Waiter: 4145 3 1 0 3 TS 17970
1 Waiter: Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com1tag:blogger.com,1999:blog-5242763796932817271.post-13003566055454448112012-11-05T11:38:00.000-05:002012-12-02T06:02:49.287-05:00DEBUG RMAN: "RMAN-20052: invalid datafile create SCN" due to PLUGIN_SCN out of SYNCBelow error persists for one database during SYNC operation:
RMAN> resync catalog;
starting full resync of recovery catalog
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of resync command on default Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com2tag:blogger.com,1999:blog-5242763796932817271.post-87005497933182768032012-10-29T03:53:00.001-04:002012-10-29T19:46:55.302-04:00LMON (ospid: 11863): terminating the instance due to error 481Two nodes RAC, suddenly begin reconfiguration and very soon the first instance terminated with error:
LMON (ospid: 11863): terminating the instance due to error 481
So we have two problems:
1. Why the configration suddenly began?
2. Why the first instance terminated during the reconfiguration?
1:Why the configration suddenly began?
From the first node's lmon trace file:
* DRM RCFG called (swin Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com1tag:blogger.com,1999:blog-5242763796932817271.post-6694048163763280972012-10-11T12:49:00.000-04:002012-10-29T02:44:28.768-04:00A few troubles when doing PSU patching on AIX RACwhen help a friend patching an AIX Rac server, i got a few troubles, record them here.
1. opatch failed and reporting:
Required amount of space(23428.225MB) is not available.
UtilSession failed:
Prerequisite check "CheckSystemSpace" failed.
Log file location: /oracle/app/11.2.0/grid/cfgtoollogs/opatch/opatch2012-10-07_18-26-07PM_1.log
OPatch failed with error code 73The mount point have 20gb Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com1tag:blogger.com,1999:blog-5242763796932817271.post-64684134155087874342012-09-22T06:31:00.000-04:002012-09-22T06:31:01.732-04:00cluster on first node failed to startup after reboot due to GPNP hang on third node3 nodes Rac, in a PSU activity, the cluster on first node failed to startup after patching.
Below is the error in alertnode1.log:
2012-09-22 03:01:39.894
[ohasd(28563)]CRS-2112:The OLR service started on node cintrnpdb001.
2012-09-22 03:01:39.917
[ohasd(28563)]CRS-1301:Oracle High Availability Service started on node cintrnpdb001.
2012-09-22 03:01:39.918
[ohasd(28563)]CRS-8017:location: /Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com0tag:blogger.com,1999:blog-5242763796932817271.post-71563903129552800702012-09-12T22:58:00.000-04:002012-09-12T23:24:31.516-04:00CSSD terminated from clssnmvDiskPingMonitorThread without disk timeout countdownOne node's cluster suddenly terminated.
Below is the message in cluster's alert.log:
2012-09-11 11:41:30.328
[ctssd(22428)]CRS-2409:The clock on host node8 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchroniza
tion Service is running in observer mode.
2012-09-11 12:30:03.122
[cssd(21061)]CRS-1606:The number of voting files available, 0, is less Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com3tag:blogger.com,1999:blog-5242763796932817271.post-93602484943629462012-09-05T21:08:00.000-04:002012-09-05T21:08:05.810-04:00RAC: "CRS-0184: Cannot communicate with the CRS daemon“ due to EVMDOne node of a Rac server get some trouble, its CRSD keep terminating and rebooting. When try to query CRSD will get error “Cannot communicate with the CRS daemon”.
Below error repeate in alertnode1.log:
2012-08-28 19:32:32.995
[ohasd(26038)]CRS-2765:Resource 'ora.crsd' has failed on server 'racnode001'.
2012-08-28 19:32:38.319
[crsd(13117)]CRS-1012:The OCR service started on node racnode001.
Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com4tag:blogger.com,1999:blog-5242763796932817271.post-8684501907198249782012-09-02T23:17:00.003-04:002012-09-02T23:19:59.701-04:00Blocker=Waiter: Self Dead Lock caused within one sessionWe know very well about deadlock. We know deadlock happens when multiple sessions compete some resources.
And in some particular case, dead lock can also be caused within the same session, below is a example:
SQL> create index obj_ind4$ on obj$(name);
create index obj_ind4$ on obj$(name)
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-00060Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com0tag:blogger.com,1999:blog-5242763796932817271.post-35878734913623828052012-08-07T03:39:00.000-04:002012-08-07T03:58:43.204-04:00SYSTEMSTATE DUMP: Shooting DB Hang, sqlplus Hang, self deadlock latch, PMON deadlock latchOne DB is hang. Try to login with "sqlplus / as sysdba" also fail, just hang there.I use "-prelim" option to login and make a systemstate dump.Below is the ass.awk output for systemstate dump:
System State 1
~~~~~~~~~~~~~~~~
1:
2: 0: waiting for 'pmon timer' <---- PMON is fine at this moment
3: 0: waiting for 'rdbms ipc message'
4: 0: waiting for 'VKTM Logical Idle Wait'
5: 0: waiting Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com0tag:blogger.com,1999:blog-5242763796932817271.post-89781437587061208582012-07-30T01:49:00.000-04:002012-07-30T22:40:16.736-04:00RMAN senior: duplicate database from tape without target DB, "RMAN-05548: The set of duplicated tablespaces is not self-contained"Today, user require to restore a table from old backup.
The table is small, a few thousands rows. But the DB is big, 200gb. We don't have sufficient space to support restoring the whole DB.
We decide to resotre the table and its tablespace only, we will use "duplicate database skip tablespace" clause to skip irrelative tablespaces:
server001[oracle]_ORCLPROD> rman target / auxiliary sys/Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com5tag:blogger.com,1999:blog-5242763796932817271.post-36807861078827018032012-07-21T18:12:00.001-04:002012-07-21T18:21:12.245-04:00From 11.2.0.2.5, oracle don't support ADJUST SCN any moreToday, after incomplete recovery and opened a DB, i decided to increasing SCN for the DB.I set "_allow_error_simulation=true", and use:
alter session set events 'IMMEDIATE trace name adjust_scn level 4490';
ALTER SESSION SET EVENTS '10015 TRACE NAME ADJUST_SCN LEVEL 4490';
at open/mount state for serveral times, but the SCN didn't get changed!The alert.log shew that oracle even didn't try to Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com0tag:blogger.com,1999:blog-5242763796932817271.post-10544164640836565222012-07-05T08:29:00.000-04:002012-07-05T08:29:35.952-04:00Instance evicted due to HAIP failureRac cluster, one server's instances abruptly evicted and re-joined the cluster after 10 minutes. Below is from DB's alert log:
Wed Jun 27 00:34:55 2012
Errors in file /q001/product/ora_base/diag/rdbms/ttpoltq/ttpoltq1/trace/ttpoltq1_ora_21759.trc (incident=139278):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependentKane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com0tag:blogger.com,1999:blog-5242763796932817271.post-64007577075866388382012-07-03T08:39:00.000-04:002012-07-03T23:18:50.456-04:00MMAN in idle state but blocking other sessions which waiting 'SGA: allocation forcing component growth'Find many blocking in one instance of an three nodes Rac DB.
Most sessions are either hang or running slow, after tracing found the blocker is MMAN process.
Below is from the trace:
Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (apccfp1.apccfp11)
os id: 15557
Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com0tag:blogger.com,1999:blog-5242763796932817271.post-44989217038648875702012-07-01T00:10:00.003-04:002012-07-03T09:11:23.125-04:00diagnose 04030: koh-kghu sessi and large of free spaceRecently one of our prod DB repeatly got 04030 error:
ORA-04030: out of process memory when trying allocate 64024 bytes (qmxeventInitCt,qmxeventInitctx:buffer)
ORA-04030: out of process memory when trying allocate 824504 bytes (pga heap,kco buffer)
ORA-04030: out of process memory when trying allocate 16328 bytes (PLSQL Opt Pool,pdz7M04_Create)
ORA-04030: (pga heap,kco buffer) may indiate kco Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com0tag:blogger.com,1999:blog-5242763796932817271.post-74603087815637165952012-06-30T05:30:00.001-04:002012-06-30T05:41:15.754-04:00RMAN blocked by ARCH process, waiting “enq: WL - contention”Today, we got a file system alert, one server's arch mount point is 90% full.After checking found one DB's archivelog occupied 90% space.And this DB's backup arch job was running since 10 days ago and still hang there, no progress.After checking in the DB, found the rman job was blocked by below red part event:
SSID OSUSER USERNAME STATE STATUSKane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com1tag:blogger.com,1999:blog-5242763796932817271.post-90971120239720957372012-06-29T06:52:00.002-04:002012-06-29T21:48:26.115-04:00hack the core:bootstrap$, replace bootstrap$ with user's tablebootstrap$ is the most core table for database. It contains defination for oracle most critical dictinary tables such as: obj$, tab$, ts$ etc.During an startup, oracle will read bootstrap$ to build dictionary for whole DB.Any modification on bootstrap$ may destroy your database, and oracle support won't help anything on that.Below is bootstrap$ structure and one row in it:
SYS @ gededKane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com0tag:blogger.com,1999:blog-5242763796932817271.post-25789405751683829862012-06-27T05:19:00.000-04:002012-06-27T05:32:34.994-04:00LMS terminating the instance due to error 4031One of our two node Rac DB encounter 04031 due to inbalance in mini sub heap.After turned parameter for shared_pool, i decided to bounce instance one by one.But when i was bouncing the instance on first node, the second node instance also crashed.Below is in the logfile for second instance:
Wed Jun 27 04:57:28 2012
Reconfiguration started (old inc 4, new inc 6)
List of instances:
2 (myinst: 2)
Kane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com0tag:blogger.com,1999:blog-5242763796932817271.post-58810325068242990112012-06-23T06:33:00.000-04:002012-06-23T06:51:28.347-04:0004031 and large pool, parallel and PX msg poolParallelism require "PX msg pool". In past, "PX msg pool" are allocate from shared_pool:
SYS @ test > select * from v$sgastat where upper(name) like '%PX%';
POOL NAME BYTES
------------ -------------------------------------------------- ----------
shared pool PX subheap desc 256
shared poolKane Zhanghttp://www.blogger.com/profile/10004641664457706458noreply@blogger.com2