June 6, 2012

ORA-04031 caused by parallel: "PQ Slave mismatch"

Our one Rac DB repeatly got 04031 error:
DDE: Problem Key 'ORA 4031' was flood controlled (0x6) (incident: 146179)
ORA-04031: unable to allocate 4064 bytes of shared memory ("shared pool","select di.inst_id,di.didbi,d...","sga heap(1,0)","kglsim heap")


When the 04031 error occur, we can't login into DB or can't execute any command in the DB including "alter system flush shared_pool;"

From the dumpfile we can see most space in shared_pool are oppcupied by SQL AREA:
Allocation Name         Size
"free memory "     188926712
"SQLA "            429067088


We also notice below part information:
LibraryHandle: Address=0x945ae1c8 Hash=e0b78783 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
ObjectName: Name=select di.inst_id,di.didbi,di.didbn,
to_date(di.dicts,'MM/DD/RRHH24:MI:SS','NLS_CALENDAR=Gregorian'),
to_number(di.dirls),to_date(di.dirlc,'MM/DD/RR HH24:MI:SS','NLS_CALENDAR=Gregorian'),
to_number(di.diprs),to_date(di.diprc,'MM/DD/RR HH24:MI:SS','NLS_CALENDAR=Gregorian'),
decode(di.dimla,0,'NOARCHIVELOG',1,'ARCHIVELOG','MANUAL'),to_number(di.discn),
to_number(di.difas),decode(bitand(di.diflg,256),256,'CREATED',decode(bitand(di.diflg,1024),
1024,'STANDBY',decode(bitand(di.diflg,32768),32768,'CLONE',decode(bitand
FullHashValue=25c9709f1fd8a1c460008a64e0b78783 Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=3770124163 OwnerIdn=0
Statistics: InvalidationCount=54 ExecutionCount=0 LoadCount=60 ActiveLocks=1 TotalLockCount=59 TotalPinCount=1

Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=1 BucketInUse=58 HandleInUse=58 HandleReferenceCount=0
Concurrency: DependencyMutex=0x945ae278(0, 58, 0, 0) Mutex=0x945ae2f8(479, 787, 0, 6)
Flags=RON/PIN/TIM/PN0/DBN/[10012841]
WaitersLists:
Lock=0x945ae258[0x945ae258,0x945ae258]
Pin=0x945ae238[0x945ae238,0x945ae238]
Timestamp: Current=06-03-2012 20:06:06
HandleReference: Address=0x945af0d8 Handle=(nil) Flags=[00]
LibraryObject: Address=0xa11940b0 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000]
ChildTable: size='64'
Child: id='0' Table=0xa1194f60 Reference=0xa1194988 Handle=0xa99918f8
Child: id='1' Table=0xa1194f60 Reference=0xa1194c50 Handle=0x85e39480
Child: id='2' Table=0xa1194f60 Reference=0xa1194f18 Handle=0xa9fbb930
Child: id='3' Table=0xa1194f60 Reference=0xab8006e8 Handle=0xa3927ca0
Child: id='4' Table=0xa1194f60 Reference=0xab8009b0 Handle=0xa9741b18
Child: id='5' Table=0xa1194f60 Reference=0xab800c78 Handle=0xa95b7860
Child: id='6' Table=0xa1194f60 Reference=0xab800f40 Handle=0xa388ac00
Child: id='7' Table=0xa1194f60 Reference=0xab801208 Handle=0xa38f3330
Child: id='8' Table=0xa1194f60 Reference=0x9e291120 Handle=0x858cb740
Child: id='9' Table=0xa1194f60 Reference=0x9e2913e8 Handle=0x858f89f8
Child: id='10' Table=0xa1194f60 Reference=0x9e2916b0 Handle=0x94f22198
Child: id='11' Table=0xa1194f60 Reference=0x9e291978 Handle=0xa31e6520
Child: id='12' Table=0xa1194f60 Reference=0x9e291c40 Handle=0x85926718
Child: id='13' Table=0xa1194f60 Reference=0xab3ee430 Handle=0x85829f60
Child: id='14' Table=0xa1194f60 Reference=0xab3ee6f8 Handle=0x857ab5c8
Child: id='15' Table=0xa1194f60 Reference=0xab3ee9c0 Handle=0x946d70b8
Child: id='16' Table=0x9e291d98 Reference=0xab3eec88 Handle=0x8591b818
Child: id='17' Table=0x9e291d98 Reference=0xab3eef50 Handle=0x85ff3670
Child: id='18' Table=0x9e291d98 Reference=0x9edc72e8 Handle=0x856316f0
Child: id='19' Table=0x9e291d98 Reference=0x9edc75b0 Handle=0x9a420f10
Child: id='20' Table=0x9e291d98 Reference=0x9edc7878 Handle=0xa9cf3b48
Child: id='21' Table=0x9e291d98 Reference=0x9edc7b40 Handle=0x85420040
Child: id='22' Table=0x9e291d98 Reference=0x9edc7e08 Handle=0xa96637d0
Child: id='23' Table=0x9e291d98 Reference=0xaba186e8 Handle=0x85830f10
Child: id='24' Table=0x9e291d98 Reference=0xaba189b0 Handle=0x9a2dc110
Child: id='25' Table=0x9e291d98 Reference=0xaba18c78 Handle=0x9a2077d8
Child: id='26' Table=0x9e291d98 Reference=0xaba18f40 Handle=0xa3e05e30
Child: id='27' Table=0x9e291d98 Reference=0xaba19208 Handle=0xa9999908
Child: id='28' Table=0x9e291d98 Reference=0x7eab32e8 Handle=0x858c09b8
Child: id='29' Table=0x9e291d98 Reference=0x7eab35b0 Handle=0x85948c90
Child: id='30' Table=0x9e291d98 Reference=0x7eab3878 Handle=0xa38c0c98
Child: id='31' Table=0x9e291d98 Reference=0x7eab3b40 Handle=0xa3ecae20
Child: id='32' Table=0xaba19360 Reference=0x7eab3e08 Handle=0xa3dc11a8
Child: id='33' Table=0xaba19360 Reference=0x9e454120 Handle=0x85b7dc70
Child: id='34' Table=0xaba19360 Reference=0x9e4543e8 Handle=0xa97c05d0
Child: id='35' Table=0xaba19360 Reference=0x9e4546b0 Handle=0xa95de978
.........
.........


From above we see the SQL has so many child cursors, which means the SQL was not  shared. And this situation can be the cause of 04031.

We need to find out the reason that why the SQL not shared.

After bounced DB, I write a script to monitor the DB. The next day,  i checked the output and find below:

SYS @ xxx13 >select sql_id,version_COUNT,SHARABLE_MEM/1024/1024,PERSISTENT_MEM/1024/1024 from gv\$sqlarea where version_COUNT>30 or sql_id='6004acmhbg1w3' order by sql_id,inst_id;
SQL_ID        VERSION_COUNT SHARABLE_MEM/1024/1024 PERSISTENT_MEM/1024/1024
------------- ------------- ---------------------- ------------------------
0v6s91manuhz8           175             26.5997515               6.95877075
0v6s91manuhz8           148             22.4999056               5.88513184
0v6s91manuhz8            44             .348004341               .079528809
3j9yx7t5abcyg            71             8.32154465               4.36545563
3j9yx7t5abcyg            62             7.26753616               3.81208801
6wtwv38h14q3q            39             8.43435574               2.30687714
6wtwv38h14q3q            36             7.98299313               2.12942505

SYS @ xxx13 >select to_char(substr(reason,58,25)),count(*) from gV$SQL_SHARED_CURSOR where (sql_id='6004acmhbg1w3' and CHILD_NUMBER<10) or CHILD_NUMBER>30 group by to_char(substr(reason,58,25));
TO_CHAR(SUBSTR(REASON,58,25))       COUNT(*)
--------------------------------- ----------
PQ Slave mismatch(5)                     436


The red part menas the during a parallel query, the slave process  failed to shared the cursor and re-parsed again. 
Let's check the parallel setting in the DB:
show parameter parallel
parallel_max_servers integer 685
parallel_server      boolean TRUE



The parallel_max_servers setted to 685.
This issue looks like a bug, but for a temp workaround we can try to disable parallel in the DB.

0 Comments:

Post a Comment