首页 » ORACLE 9i-23c » Troubleshooting ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT

Troubleshooting ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT

最近有家银行客户一套核心凌晨跑批时报出了ORA-04036,与12c后增加的PGA_AGGREGATE_LIMIT有关,环境oracle RAC 12.1.0.2 on AIX, 临时增加了PGA_AGGREGATE_LIMIT参数大小解决,事后找我分析原因,简单记录一下分析思路。

DB ALERT LOG

	TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 12.1.0.2.0 - Production
	Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 12.1.0.2.0 - Production
	
Sun May 14 00:46:50 2023
Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob2/trace/anbob2_ora_42795290.trc  (incident=1262543):
ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT    <<<<<<<<<<<<<<<<<<<<<<

Fatal NI connect error 12537, connecting to:
 (LOCAL=NO)
  Time: 14-MAY-2023 00:46:56
Sun May 14 00:46:52 2023
Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob2/trace/anbob2_ora_30409770.trc  (incident=1262535):
ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT

Sun May 14 00:46:55 2023
Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob2/trace/anbob2_ora_35389838.trc  (incident=1260031):
ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT
Sun May 14 00:47:37 2023
PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT of 26368 MB             <<<<<<<<<<<<<<<<<<
  VERSION INFORMATION:
	TNS for IBM/AIX RISC System/6000: Version 12.1.0.2.0 - Production
	TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 12.1.0.2.0 - Production
	Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 12.1.0.2.0 - Production
	
Thread 2 advanced to log sequence 155965 (LGWR switch)
  Current log# 6 seq# 155965 mem# 0: +DATA/anbob/redo61.log
  Current log# 6 seq# 155965 mem# 1: +DATA/anbob/redo62.log
Sun May 14 00:55:57 2023
Archived Log entry 512603 added for thread 2 sequence 155964 ID 0x35ebc53e dest 1:
Sun May 14 01:02:30 2023
ALTER SYSTEM SET pga_aggregate_limit=30G SCOPE=BOTH SID='*';                         <<<<<<<<<<<<<<<<<<
Sun May 14 01:25:52 2023

数据库参数
memory_target 0
pga_aggregate_target  13G
PGA_AGGREGATE_LIMIT 25G

ora-4036

[oracle@oel7db1 ~]$ oerr ora 4036
04036, 00000, "PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT"
// *Cause:  Private memory across the instance exceeded the limit specified
//          in the PGA_AGGREGATE_LIMIT initialization parameter.  The largest
//          sessions using Program Global Area (PGA) memory were interrupted
//          to get under the limit.
// *Action: Increase the PGA_AGGREGATE_LIMIT initialization parameter or reduce
//          memory usage.

 

PGA_AGGREGATE_LIMIT
大多数数据库都在使用自动 PGA 内存管理,一旦 PGA_AGGREGATE_TARGET 参数设置为非零值,就会启用该管理。一个常见的误解是,此设置实际上限制了实例的整体 PGA 使用。事实是,自动PGA内存管理将尝试遵守PGA_AGGREGATE_TARGET值。这些是 Oracle 官方文档中的实际用语:“尝试遵守”!这意味着排序内存、散列内存和位图内存将通过自动 PGA 内存管理主动限制每个进程的大小,任何分配超过自动 PGA 内存管理允许的尝试都将导致将这些内存区域的某些内容移动到数据库用户的分配临时表空间,以便为新数据腾出空间。但是,每个进程有更多的内存区域可分配,这些内存区域永远不会交换到磁盘,因此将始终保留在内存中,并且在 Oracle 版本 12 之前无法以官方支持的方式限制这些内存区域。在 PGA 中分配且从未交换到磁盘的两个结构是 PL/SQL 集合和 PL/SQL 表,创建和填充这些需要使用PL / SQL,这意味着开发人员可以创建这样的结构,并分配他/她喜欢的任何内容,以及它可能产生的所有后果,Oracle 在 Oracle 12 中引入了一个名为 PGA_AGGREGATE_LIMIT 的参数,以有效地限制整体 PGA 堆大小。
关于PGA_AGGREGATE_LIMIT参数在我之前的blog有记录《Know more about PGA_AGGREGATE_LIMIT 12c 19c》

Frits Hoogland曾分享过测试突破PGA_AGGREGATE_TARGET的方法

exec dbms_random.seed('abracadabra');
create table t2
as
with generator as (
    select      rownum      id
    from        dual
    connect by
                rownum <= 1000
)
select
    rownum                                                id,
    trunc((rownum-1)/50)                            clustered,
    mod(rownum,20000)                               scattered,
    trunc(dbms_random.value(0,20000))               randomized,
    trunc(sysdate) + dbms_random.value(-180, 180)   random_date,
    dbms_random.string('l',6)                       random_string,
    lpad(rownum,10,0)                               vc_small,
    rpad('x',100,'x')                               vc_padding
from
    generator   g1,
    generator   g2
where
    rownum <= 1000000
;
exec dbms_stats.gather_table_stats(null,'T2');

declare
    type sourcetab is table of t2%ROWTYPE;
    c_tmp       sourcetab;
    c_def       sourcetab   := sourcetab();
    v_b_p       number      := 0;
    v_c_p       number      := 0;
    v_b_u       number      := 0;
    v_c_u       number      := 0;
    grow_until  number      := 700000000;
    p_a_t       number;
begin
    select value into v_b_p from v$mystat m, v$statname n where m.statistic#=n.statistic# and name = 'session pga memory max';
    select value into v_b_u from v$mystat m, v$statname n where m.statistic#=n.statistic# and name = 'session uga memory max';
    select value into p_a_t from v$parameter where name = 'pga_aggregate_target';
    select * bulk collect into c_tmp from t2;
    while v_c_p < grow_until loop for c in c_tmp.first .. c_tmp.last loop c_def.extend(1); c_def(c_def.last) := c_tmp(c); select value into v_c_p from v$mystat m, v$statname n where m.statistic#=n.statistic# and name = 'session pga memory max'; select value into v_c_u from v$mystat m, v$statname n where m.statistic#=n.statistic# and name = 'session uga memory max'; if v_c_p >= grow_until then
                exit;
            end if;
        end loop;
    end loop;
    dbms_output.put_line('vbp : '||v_b_p);
    dbms_output.put_line('vcp : '||v_c_p);
    dbms_output.put_line('vbu : '||v_b_u);
    dbms_output.put_line('vcu : '||v_c_u);
    dbms_output.put_line('pat : '||p_a_t);
end;
/

配置pga_aggregate_target一个较小的值上面可以测试超过pga_aggregate_target, 12c后引入pga_aggregate_limit限制比如尝试分配 900MB。这意味着“grow_until”变量必须设置为 900000000。
请注意,这是作为普通用户完成的,SYS 用户和作业队列进程以外的后台进程不受限制。

当超过pga_aggregate_limit时会出现ora-4036. 请注意,PGA_AGGREGATE_LIMIT似乎真正限制了 PGA 使用实例的范围,而不是像事件 10261(Oracle 11.2 的 PGA 使用限制方式)那样限制每个进程的 PGA 堆。

alter system set event = '10261 trace name context forever, level 600000' scope=spfile;
System altered.

ora-4036 trace

.\anbob2\tfa\anbobdb02\diag\rdbms\anbob\anbob2\incident


*** 2023-05-14 00:47:45.018
*** SESSION ID:(3526.21157) 2023-05-14 00:47:45.018
*** CLIENT ID:() 2023-05-14 00:47:45.018
*** SERVICE NAME:(anbob) 2023-05-14 00:47:45.018
*** MODULE NAME:(JDBC Thin Client) 2023-05-14 00:47:45.018
*** CLIENT DRIVER:(jdbcthin) 2023-05-14 00:47:45.018
*** ACTION NAME:() 2023-05-14 00:47:45.018
 
[TOC00000]
Jump to table of contents
Dump continued from file: /oracle/app/oracle/diag/rdbms/anbob/anbob2/trace/anbob2_ora_30409770.trc
[TOC00001]
ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT

[TOC00001-END]

*** 2023-05-14 00:47:52.459
[TOC00002]
========= Dump for incident 1262535 (ORA 4036) ========

*** 2023-05-14 00:47:52.459
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
[TOC00003]
----- Current SQL Statement for this session (sql_id=2cpwnwjvuzxx8) -----
			SELECT * FROM kdpl_***mx 
			WHERE jiluztai = '0'
				and ****GHAO = :1 
				and ****LEIX = :2 
				and ****XIRQ = :3 
				and ( TZSHYNBZ = '1' or ( LIX***M = 'ADJUSTMENT' and TZSHYNBZ = '2' ) ) 
				and farendma = :4 
		
PRIVATE MEMORY SUMMARY FOR THIS PROCESS
---------------------------------------
******************************************************
PRIVATE HEAP SUMMARY DUMP
1610 MB total:
  1606 MB commented, 680 KB permanent
  3834 KB free (0 KB in empty extents),
    1604 MB,   2 heaps:   "callheap       "           

*** 2023-05-14 00:55:38.255
------------------------------------------------------
Summary of subheaps at depth 1
1580 MB total:
  1409 MB commented, 171 MB permanent
   138 KB free (0 KB in empty extents),
    1388 MB, 325263 heaps:   "qmu subheap    "           <<<
------------------------------------------------------
Summary of subheaps at depth 2
1362 MB total:
    70 KB commented, 378 MB permanent
   983 MB free (650 MB in empty extents),

*** 2023-05-14 00:55:39.513
sending 4036 interrupt
DDE: Problem Key 'ORA 4036' was flood controlled (0x6) (incident: 1262383)
ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
...

dbgePostErrorKGE()+  call     dbgeExecuteForError  000000000 ? 000000000 ?

dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   000020000 ? 121657688 ?
64                                                 FC4FFFF2170 ? 1218C4BF8 ?

kgeade()+408         call     dbkePostKGE_kgsf()   00001BF00 ? 700000000004940 ?

kgeselv()+96         call     kgeade()             FFFFFFFFFFF22E0 ?

ksesecl0()+84        call     kgeselv()            1049672B4 ? 700000000004940 ?

ksm_4036_interrupt(  call     ksesecl0()           111A58C20 ? 00001BEDC ?
)+80                                               FC400000FC4 ?
                                                   FFFFFFFFFFFFFFFF ?
ksm_check_over_limit call     ksm_4036_interrupt(  BE000000BE ? 100000001 ?
()
ksmarfg()+248        call     ksm_check_over_limi  000000000 ? FFFFFFFFFFF2600 ?
                              t()                  00000000C ? 000010000 ?

kghgex()+1464        call     ksmarfg()            FFFFFFFFFFF6A90 ?

                                                   700010FC1DADD08 ?
kghfnd()+1092        call     kghgex()             FFFFFFFFFFF2500 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
kghalo()+2996        call     kghfnd()             121263568 ? 121263460 ?
                                                   121263568 ? 121657688 ?
                                                   121657688 ? 000001058 ?
                                                   000000000 ? 000000000 ?
kghgex()+928         call     kghalo()             FFFFFFFFFFF2A70 ?
                                                   700000000014A30 ?
                                                   FFFFFFFFFFF2AA0 ?
                                                   44224248000001E0 ?
                                                   FFFFFFFFFFF2AB0 ? 15C26BF78 ?
                                                   121646978 ? 000010000 ?
kghfnd()+1092        call     kghgex()             000000000 ? 000000000 ?
                                                   121644E68 ? 121657688 ?
                                                   FFFFFFFFFFF2C30 ?
                                                   4222402021657688 ?
                                                   104131E84 ? 000000FF8 ?
kghalo()+2996        call     kghfnd()             121644EB4 ? 000000240 ?
                                                   000000000 ? 121657688 ?
                                                   112CF40A0 ? 121644E68 ?
                                                   FFFFFFFFFFF2D20 ?
                                                   4222402000000000 ?
kghgex()+928         call     kghalo()             000000000 ? 000000000 ?
                                                   000000030 ? 000000240 ?
                                                   000000FE8 ? 000000000 ?
                                                   15C26C54C ? 121657764 ?
kghfnd()+1092        call     kghgex()             000000000 ? 000000000 ?
                                                   15C26C500 ? 121657688 ?
                                                   FFFFFFFFFFF3010 ?
                                                   4222202000000001 ?
                                                   104131E84 ? 000000FB8 ?
kghalo()+2996        call     kghfnd()             15C26C54C ? 0000000A0 ?
                                                   000000000 ? 121657688 ?
kghgex()+928         call     kghalo()             700010E5EC86380 ? 000000000 ?

kghalf()+1496        call     kghgex()             000000001 ? 000000001 ?

kxsSegAllocate()+16  call     kghalf()             000000000 ? 15C26C500 ?

kxsFrameSegAllocate  call     kxsSegAllocate()     000000005 ? 000000000 ?
()+92                                              000000001 ? 0000000F1 ?

kxsEphemeralFramePush  call     kxsFrameSegAllocate  10AD30DF8 ? 1120155AC ?
()+196                      ()                   FFFFFFFFFFF35E0 ? 1215C34B8 ?
                                                   ABC00000ABC ? 100000001 ?

kkeutlEvalOpn()+184  call     kxsEphemeralFramePu  000000002 ? 000000000 ?
                              sh()                 FFFFFFFFFFF36B0 ? 000000141 ?

                                                   35A4252FFFFF3CC8 ?
kkeutlCompareConstants  call     kkeutlEvalOpn()      000000000 ? 000000058 ?
()+552                                          FFFFFFFFFFF3A60 ? 000000000 ?
                                                   15C274088 ? 700010E5EC86380 ?
                                                   000008000 ? 000000070 ?
kkeutlCompHistActVals  call     kkeutlCompareConsta  121646938 ? 121657688 ?
()+128                      nts()                FFFFFFFFFFF3A60 ? 0645FBEF4 ?
                                                   59CB9FFFF38A0 ? 000000000 ?
                                                   000000007 ? 1215C34B8 ?
kkehbs()+1212        call     kkeutlCompHistActVa  FFFFFFFFFFF3930 ?
                              ls()                 700010FD934CD38 ? 000000000 ?

kkeh2sl_Int()+344    call     kkehbs()             000000004 ? 000000001 ?
                                                   FFFFFFFFFFF4950 ? 121A12548 ?
                                                   000000000 ? 000000001 ?
                                                   000000004 ? 000000001 ?
kkehsl_Int()+4016    call     kkeh2sl_Int()        10A2FDEF0 ? 000000003 ?
                                                   000000001 ? 000000008 ?

kkehslCursorMatch()  call     kkehsl_Int()         FFFFFFFFFFF3D10 ? 000002000 ?
+88                                                1049D1324 ? 0000001B8 ?

kkocsPredSel()+744   call     kkehslCursorMatch()  10547A464 ? 1124FE518 ?
                                                   FFFFFFFFFFF3E60 ? 000000000 ?
                                                   121644EB4 ? 000000000 ?
                                                   15C273F70 ? 700010FC994E3A8 ?
kkscsVerifyBindEquivRP  call     kkocsPredSel()       000000000 ? 000000038 ?
()+164                                          000000010 ?
                                                   3FEFD417E514BD2E ?
                                                   121931B30 ? 000000010 ?
                                                   000000248 ? 000000000 ?
kkscsVerifyBindEquivalence  call     kkscsVerifyBindEqui  020000000 ? 000000010 ?
()+92                  vRP()                FFFFFFFFFFF4040 ?
                                                   1000000000001 ?
                                                   FFFFFFFFFFF4050 ? 121931B30 ?
                                                   000000002 ? 700010ED6572B78 ?
kkscsMatchEquivLiterals  call     kkscsVerifyBindEqui  FFFFFFFFFFF40E0 ?
()+484                    valence()            700010E7C0FBAA8 ?
                                                   700010ED3CC6CC8 ? 000000003 ?
                                                   010008110 ? 700010E7C0FBAA8 ?
                                                   FFFFFFFFFFF40E0 ? 100000003 ?
kkscsCompareBinds()  call     kkscsMatchEquivLite  000000000 ? 000000000 ?
+1796                         rals()               000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
kkscscid_bnd_eval()  call     kkscsCompareBinds()  000000000 ? 000000000 ?
+348                                               10200000102 ? 100000001 ?
                                                   FFFFFFFFFFF51F0 ? 000000000 ?
                                                   1124FE924 ? 1124FE714 ?
kkscsCheckCriteria(  call     kkscscid_bnd_eval()  121657688 ? 121A125C0 ?
)+1624                                             FFFFFFFFFFF4470 ?
                                                   FFFFFFFFFFF4DE0 ?
                                                   FFFFFFFFFFF4470 ? 000000005 ?
                                                   700010E6F2DBB88 ? 3215C34B8 ?
kkscsCheckCursor()+  call     kkscsCheckCriteria(  121657688 ? 121A125C0 ?
696                           )                    121A125C0 ? 000000000 ?
                                                   500000000000000 ? 000000000 ?
                                                   000000000 ? 000000000 ?
kkscsSearchChildList()  call     kkscsCheckCursor()   121657688 ? FFFFFFFFFFF4DE0 ?
+904                                            121A125C0 ? 102FFFF4B30 ?
                                                   FFFFFFFFFFF49C0 ?
                                                   FFFFFFFFFFF4D18 ?
                                                   700010ED3CC7600 ?
                                                   FFFFFFFFFFF4960 ?
kksfbc()+21028       call     kkscsSearchChildLis  FFFFFFFFFFF5240 ? 121A125C0 ?
                              t()                  121A125C0 ? 000000000 ?
                                                   FFFFFFFFFFF4C30 ?
                                                   42822028FFFF52C8 ?
                                                   104F07BD0 ? 000000000 ?
opiexe()+2196        call     kksfbc()             121A125C0 ? 300000000 ?
                                                   102FFFF5CB0 ? 000000000 ?
                                                   000000000 ? FFFFFFFFFFF6258 ?


报错的进程使用了1.5g heap, 此进程中较多的是”qmu subheap ” qm是与XDB相关的,以”xml”为关键字查找trace

行 13022: sword prmsmon_bindsxmlfmt_ [1215C6AA8, 1215C6AAC) = 00000C7E
行 53595: koidf qmxmlschoid_ [700010FE066D418, 700010FE066D428) = 00000000 00000000 ...

note:
排除了表列有与xml相关的类型,判断应该是与此SQL解析相关,关键字像是绑定变量的xml。call stack中提到了sql cursor相关的。

与XML 和ORA-4036相关的  known issues

For Oracle Versions BELOW 23.1
Apply patch:32769880. See NOTE:32769880.8 xmlelement leads to high consumption of pga leads to ORA-4036 or ORA-4030
Apply patch:33319065. See NOTE:2849174.1 ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT while testing with XML structures

For Oracle Versions BELOW 12.2
Apply patch:17817258. See NOTE:17817258.8 Large PGA/UGA utilization for XML queries (ORA-4030 / ORA-4036)

 

检查SQL游标不共享的的VIEW reason

SQL> @desc gv$SQL_SHARED_CURSOR
           Name                            Null?    Type
           ------------------------------- -------- ----------------------------
    1      INST_ID                                  NUMBER
    2      SQL_ID                                   VARCHAR2(13)
    3      ADDRESS                                  RAW(8)
    4      CHILD_ADDRESS                            RAW(8)
    5      CHILD_NUMBER                             NUMBER
    6      UNBOUND_CURSOR                           VARCHAR2(1)
    7      SQL_TYPE_MISMATCH                        VARCHAR2(1)
    8      OPTIMIZER_MISMATCH                       VARCHAR2(1)
    9      OUTLINE_MISMATCH                         VARCHAR2(1)
   10      STATS_ROW_MISMATCH                       VARCHAR2(1)
   11      LITERAL_MISMATCH                         VARCHAR2(1)
   12      FORCE_HARD_PARSE                         VARCHAR2(1)
   13      EXPLAIN_PLAN_CURSOR                      VARCHAR2(1)
   14      BUFFERED_DML_MISMATCH                    VARCHAR2(1)
   15      PDML_ENV_MISMATCH                        VARCHAR2(1)
   16      INST_DRTLD_MISMATCH                      VARCHAR2(1)
   17      SLAVE_QC_MISMATCH                        VARCHAR2(1)
   18      TYPECHECK_MISMATCH                       VARCHAR2(1)
   19      AUTH_CHECK_MISMATCH                      VARCHAR2(1)
   20      BIND_MISMATCH                            VARCHAR2(1)
   21      DESCRIBE_MISMATCH                        VARCHAR2(1)
   22      LANGUAGE_MISMATCH                        VARCHAR2(1)
   23      TRANSLATION_MISMATCH                     VARCHAR2(1)
   24      BIND_EQUIV_FAILURE                       VARCHAR2(1)
   25      INSUFF_PRIVS                             VARCHAR2(1)
   26      INSUFF_PRIVS_REM                         VARCHAR2(1)
   27      REMOTE_TRANS_MISMATCH                    VARCHAR2(1)
   28      LOGMINER_SESSION_MISMATCH                VARCHAR2(1)
   29      INCOMP_LTRL_MISMATCH                     VARCHAR2(1)
   30      OVERLAP_TIME_MISMATCH                    VARCHAR2(1)
   31      EDITION_MISMATCH                         VARCHAR2(1)
   32      MV_QUERY_GEN_MISMATCH                    VARCHAR2(1)
   33      USER_BIND_PEEK_MISMATCH                  VARCHAR2(1)
   34      TYPCHK_DEP_MISMATCH                      VARCHAR2(1)
   35      NO_TRIGGER_MISMATCH                      VARCHAR2(1)
   36      FLASHBACK_CURSOR                         VARCHAR2(1)
   37      ANYDATA_TRANSFORMATION                   VARCHAR2(1)
   38      PDDL_ENV_MISMATCH                        VARCHAR2(1)
   39      TOP_LEVEL_RPI_CURSOR                     VARCHAR2(1)
   40      DIFFERENT_LONG_LENGTH                    VARCHAR2(1)
   41      LOGICAL_STANDBY_APPLY                    VARCHAR2(1)
   42      DIFF_CALL_DURN                           VARCHAR2(1)
   43      BIND_UACS_DIFF                           VARCHAR2(1)
   44      PLSQL_CMP_SWITCHS_DIFF                   VARCHAR2(1)
   45      CURSOR_PARTS_MISMATCH                    VARCHAR2(1)
   46      STB_OBJECT_MISMATCH                      VARCHAR2(1)
   47      CROSSEDITION_TRIGGER_MISMATCH            VARCHAR2(1)
   48      PQ_SLAVE_MISMATCH                        VARCHAR2(1)
   49      TOP_LEVEL_DDL_MISMATCH                   VARCHAR2(1)
   50      MULTI_PX_MISMATCH                        VARCHAR2(1)
   51      BIND_PEEKED_PQ_MISMATCH                  VARCHAR2(1)
   52      MV_REWRITE_MISMATCH                      VARCHAR2(1)
   53      ROLL_INVALID_MISMATCH                    VARCHAR2(1)
   54      OPTIMIZER_MODE_MISMATCH                  VARCHAR2(1)
   55      PX_MISMATCH                              VARCHAR2(1)
   56      MV_STALEOBJ_MISMATCH                     VARCHAR2(1)
   57      FLASHBACK_TABLE_MISMATCH                 VARCHAR2(1)
   58      LITREP_COMP_MISMATCH                     VARCHAR2(1)
   59      PLSQL_DEBUG                              VARCHAR2(1)
   60      LOAD_OPTIMIZER_STATS                     VARCHAR2(1)
   61      ACL_MISMATCH                             VARCHAR2(1)
   62      FLASHBACK_ARCHIVE_MISMATCH               VARCHAR2(1)
   63      LOCK_USER_SCHEMA_FAILED                  VARCHAR2(1)
   64      REMOTE_MAPPING_MISMATCH                  VARCHAR2(1)
   65      LOAD_RUNTIME_HEAP_FAILED                 VARCHAR2(1)
   66      HASH_MATCH_FAILED                        VARCHAR2(1)
   67      PURGED_CURSOR                            VARCHAR2(1)
   68      BIND_LENGTH_UPGRADEABLE                  VARCHAR2(1)
   69      USE_FEEDBACK_STATS                       VARCHAR2(1)
   70      REASON                                   CLOB
   71      CON_ID                                   NUMBER

SQL> select reason from gv$SQL_SHARED_CURSOR where sql_id='&sqlid';
Enter value for sqlid: 9g485acn2n30m

REASON
--------------------------------------------------------------------------------
<ChildNode><ChildNumber>3</ChildNumber><ID>3</ID><reason>Optimizer mismatch(12)<
/reason><size>2x476</size><_fix_control_key> -866096202 0
</_fix_control_key></ChildNode>

<ChildNode><ChildNumber>4</ChildNumber><ID>3</ID><reason>Optimizer mismatch(12)<
/reason><size>2x476</size><_optim_peek_user_binds> true false
</_optim_peek_user_binds><_fix_control_key> 0 -86
6096202 </_fix_control_key><optimizer_adaptive_plans> true
false </optimizer_adaptive_plans></ChildNode>

Note:
注意从gv$SQL_SHARED_CURSOR的REASON列可以判断sql cursor未共享的原因的元数据组织形式是XML格式, 存在一种可能是因为SQL 的子游标过多,该数据记录过大,从trace中查找子游标个数和未共享原因:

----------------------------------------
SO: 0x700010eddfce738, type: 96, owner: 0x700010fc994d420, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x700010fd934c7f0, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:9221 ID:, pg=0 conuid=0

LibraryObjectLock: Address=700010eddfce738 Handle=700010e5ae75478 Mode=N
CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0

User=700010fc994d420 Session=700010fc994d420 ReferenceCount=1
Flags=CNB/[0001] SavepointNum=645fbe93
LibraryHandle: Address=700010e5ae75478 Hash=77aff7a8 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
ObjectName: Name=
SELECT * FROM kdpl_***mx
WHERE jiluztai = '0'
and ****GHAO = :1
and ****LEIX = :2
and ****XIRQ = :3
and ( TZSHYNBZ = '1' or ( LIX***M = 'ADJUSTMENT' and TZSHYNBZ = '2' ) )
and farendma = :4

FullHashValue=463c6c019d50635b265794e477aff7a8 Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=0 ContainerUid=0 Identifier=2008020904 OwnerIdn=106
Statistics: InvalidationCount=0 ExecutionCount=508901 LoadCount=2630 ActiveLocks=85 TotalLockCount=34042 TotalPinCount=15
Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=541 Version=0 BucketInUse=34041 HandleInUse=34041 HandleReferenceCount=0
Concurrency: DependencyMutex=700010e5ae75528(0, 4940, 0, 0) Mutex=700010e5ae755c0(0, 16171485, 565761, 0)
Flags=RON/PIN/TIM/PN0/DBN/[10012841] Flags2=[0000]
WaitersLists:
Lock=700010e5ae75508[700010e5ae75508,700010e5ae75508]
Pin=700010e5ae754e8[700010e5ae754e8,700010e5ae754e8]
LoadLock=700010e5ae75560[700010e5ae75560,700010e5ae75560]
Timestamp: Current=04-29-2023 00:19:44
HandleReference: Address=700010e5ae75718 Handle=0 Flags=[00]
LibraryObject: Address=700010ed3cc6c00 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000]
ChildTable: size='960'
Child: id='0' Table=700010ed3cc7a80 Reference=700010ed3cc7520 Handle=700010ed579d680
Child: id='1' Table=700010ed3cc7a80 Reference=700010ed3cc78a8 Handle=700010e5e90fd08
Child: id='2' Table=700010ed3cc7a80 Reference=700010e5ec0ec10 Handle=700010ed6962300
Child: id='3' Table=700010ed3cc7a80 Reference=700010ed4d3d670 Handle=700010e5d361d90
Child: id='4' Table=700010ed3cc7a80 Reference=700010ed4d3d978 Handle=700010ed6977838
...
Child: id='957' Table=700010e58b50108 Reference=700010e5e02a560 Handle=700010ed7378be8
NamespaceDump: [TOC00022]

Parent Cursor: sql_id=2cpwnwjvuzxx8 parent=700010ed3cc6cc8 maxchild=958 plk=y ppn=n [TOC00023]
CursorDiagnosticsNodes:
ChildNode: ChildNumber=539 ID=40 reason=Bind mismatch(33) size=1x4 init_ranges_in_first_pass=1062054815
ChildNode: ChildNumber=538 ID=40 reason=Bind mismatch(33) size=1x4 init_ranges_in_first_pass=1062054815
ChildNode: ChildNumber=537 ID=40 reason=Bind mismatch(33) size=1x4 init_ranges_in_first_pass=1062054815

note:
当时运行的SQL有900多个子游标.

# AWR

Top 10 Foreground Events by Total Wait Time

Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class
cursor: mutex X 4,155,981 3762.2 0.91 7.9 Concurrency
DB CPU 3735.3 7.9
cursor: pin S wait on X 3,204 2468.2 770.35 5.2 Concurrency
log file sync 138,319 2085.7 15.08 4.4 Commit
cursor: mutex S 1,176,766 1732.5 1.47 3.6 Concurrency
gc current grant busy 228,543 1203.8 5.27 2.5 Cluster
Statistic Begin Value End Value
logons current 688 724
opened cursors current 815 788
session cursor cache count 37,925,817 38,235,110
session pga memory 2,048,921,336 2,039,449,816
session pga memory max 3,487,008,280 121,350,925,720
session uga memory 249,334,744,712 250,180,204,264
session uga memory max 1,995,725,260,856 2,006,723,832,288

问题前后的会话数并无明显增加。数据库负载上出现了sql cursor争用。分析了所有ora-4036 trace都为2cpwnwjvuzxx8 SQLid. 执行计划相同。

SQL

SQL_ID  2cpwnwjvuzxx8, child number 0
-------------------------------------
SELECT * FROM kdpl_***mx 
			WHERE jiluztai = '0'
				and ****GHAO = :1 
				and ****LEIX = :2 
				and ****XIRQ = :3 
				and ( TZSHYNBZ = '1' or ( LIX***M = 'ADJUSTMENT' and TZSHYNBZ = '2' ) ) 
				and farendma = :4 

Plan hash value: 994448910

------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |       |       |     6 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| KDPL_****MX      |     1 |   297 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | KDPL_******_IDX1 |     1 |       |     5   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("SC***XIRQ"=:3 AND ("TZS**NBZ"='1' OR ("TZSHYNBZ"='2' AND
              "LIX**ZDM"='ADJUSTMENT')) AND "JILUZTAI"='0'))
   2 - access("ZH**HAO"=:1 AND "LI**LEIX"=:2 AND "FARENDMA"=:4)
       filter(("LI**LEIX"=:2 AND "FARENDMA"=:4))

Note
-----
   - SQL profile SYS_SQLPROF_02788792a0cd003f used for this statement	
		
****************************************************************************************
OBJECT SIZE
****************************************************************************************

                SEGMENT
OWNER           NAME                                SEGMENT_TYPE           S_SIZE
--------------- ----------------------------------- ------------------ ----------
anbob            KDPL_***XMX                         TABLE                   12939
                 KDPL_******_IDX1                    INDEX                    3880


****************************************************************************************
TABLES
****************************************************************************************

                TABLE                               LOG     BUFFER                  NUM                     EMPTY                           LAST
OWNER           NAME                                TEMP    POOL    D PART         ROWS       BLOCKS       BLOCKS BLOCK_SIZE(M) AVG_SIZE(M) ANALYZED
--------------- ----------------------------------- ------- ------- - ---- ------------ ------------ ------------ ------------- ----------- --------
anbob            KDPL_***XMX                        YES.N   DEFAULT 1 NO     36,813,239    1,654,071            0        12,922      11,304 20230516

TABLE           TABLE                               Index                                  COLUMN                     Col
OWNER           NAME                                Name                           UCPTDVS NAME                       Pos DESC
--------------- ----------------------------------- ------------------------------ ------- ------------------------- ---- ----
anbob            KDPL_***XMX                        KDPL_******_IDX1               UNNNNVY ZHANGHAO                     1 ASC
                                                                                   UNNNNVY XUHAOOOO                     2 ASC
                                                                                   UNNNNVY LIXIZUHO                     3 ASC
                                                                                   UNNNNVY LIXILEIX                     4 ASC
                                                                                   UNNNNVY FARENDMA                     5 ASC
INDEX STATS
****************************************************************************************

                                                                                                                                   Average     Average
TABLE           TABLE                               Index                              LOCAL        Leaf     Distinct          Leaf Blocks Data Blocks      Cluster   LAST
OWNER           NAME                                Name                           LOG PRE    B     Blks         Keys NUM_ROWS     Per Key     Per Key       Factor D ANALYZED
--------------- ----------------------------------- ------------------------------ --- ------ - -------- ------------ -------- ----------- ----------- ------------ - --------
anbob            KDPL_***XMX                        KDPL_******_IDX1               YES |      3  478,081   36,182,285 ########           1           1   36,122,310 1 20230516

NOTE:
索引的选择率很好。

诊断脚本

如果运行时发现进程使用内存高

# PGA使用率
select round(pga_tar / 1024 / 1024, 1) pga_soft_limit,
       round(pga_limit / 1024 / 1024, 1) pga_hard_limit,
       round(a.pga_alloc, 1) pga_allocated, 
	   round(a.pga_used, 1) pga_used, 
       round(a.pga_alloc / (pga_tar / 1024 / 1024), 1) * 100 || '%' pga_soft_pct,
	   round(a.pga_alloc / (pga_limit / 1024 / 1024), 1) * 100 || '%' pga_hard_pct,
	   max_pga_proc
  from (select sum(PGA_ALLOC_MEM) / 1024 / 1024 pga_alloc ,sum(pga_used_mem)/(1024*1024) pga_used,round(max(PGA_ALLOC_MEM) / 1024 / 1024,2) max_pga_proc from v$process) a,
       (select value pga_tar from v$parameter where name='pga_aggregate_target')b,
	   (select value pga_limit from v$parameter where name='pga_aggregate_limit')c

# top进程
col LOGON for a10
col username for a20
col machine for a20
col program for a25 trunc
col module for a20
col OSUSER for a10
col SPID for a9
SELECT DECODE(TRUNC(SYSDATE - LOGON_TIME), 0, NULL, TRUNC(SYSDATE - LOGON_TIME) || ' Days' || ' + ') ||TO_CHAR(TO_DATE(TRUNC(MOD(SYSDATE-LOGON_TIME,1) * 86400), 'SSSSS'), 'HH24:MI:SS') LOGON,
SID, s.SERIAL#, p.SPID ,
ROUND(p.pga_used_mem/(1024*1024), 2) PGA_MB_USED,
ROUND(p.PGA_ALLOC_MEM/(1024*1024), 2) PGA_ALLOCed,
ROUND(pga_freeable_mem/1024/1024) PGA_FREA,
ROUND(pga_max_mem/1024/1024) PGA_MAX,
s.USERNAME, STATUS, OSUSER, MACHINE, s.PROGRAM, sql_id
FROM v$session s, v$process p
WHERE s.paddr = p.addr and PGA_ALLOC_MEM>3*(1024*1024)
order by  PGA_ALLOC_MEM desc;

select * from (
SELECT 
inst_id,
   sid
  , sql_id
  , operation_type 
  , operation_id plan_line
  , ROUND(active_time/1000000,1) active_sec
  , actual_mem_used/1024/1024 actual_mem_used
  , max_mem_used /1024/1024 max_mem_used
  , work_area_size/1024/1024 work_area_size
  , number_passes
  , tempseg_size
  , tablespace
FROM 
    gv$sql_workarea_active 
order by 
ACTUAL_MEM_USED desc
)where rownum<=10;

# 某session进程
SELECT pid, category, allocated, used, max_allocated
FROM   v$process_memory
WHERE  pid = (SELECT pid
              FROM   v$process
              WHERE  addr= (select paddr
                            FROM   v$session
                            WHERE  sid = NNN));

或
SELECT category, name, heap_name, bytes, allocation_count,
       heap_descriptor, parent_heap_descriptor
FROM   v$process_memory_detail
WHERE  pid      = XXX
--AND    category = 'Other'
;

或dump 

alter session set events'immediate trace name PGA_DETAIL_GET level <PID>';

ORADEBUG SETMYPID;
ORADEBUG DUMP PGA_DETAIL_GET <PID>;

事后分析

从DASH查看当时的情况
select to_char(sample_time,'yyyymmdd hh24:mi:ss') etime,INSTANCE_NUMBER,session_id,session_serial#,sql_id,TOP_LEVEL_SQL_ID,SQL_EXEC_START,SQL_EXEC_ID,program,PGA_ALLOCATED from TAB_ASH_TEMP_20230514
 where (INSTANCE_NUMBER,session_id,SESSION_SERIAL#) in(
select distinct INSTANCE_NUMBER,session_id,SESSION_SERIAL#  from
(select to_char(sample_time,'yyyymmdd hh24:mi:ss') etime,INSTANCE_NUMBER,session_id,SESSION_SERIAL#,program,PGA_ALLOCATED  from TAB_ASH_TEMP_20230514 order by PGA_ALLOCATED  desc nulls last) t
where rownum<=20) order by 3,2,1;

ETIME             INSTANCE_NUMBER SESSION_ID SESSION_SERIAL# SQL_ID        TOP_LEVEL_SQL SQL_EXEC_START      SQL_EXEC_ID PROGRAM                 PGA_ALLOCATED
----------------- --------------- ---------- --------------- ------------- ------------- ------------------- ----------- -------------------- ----------------
20230514 00:44:44               2       1056           45684 gr9nbg3ptgzn4 gr9nbg3ptgzn4 2023-05-14 00:44:42    33556278 JDBC Thin Client            5,607,424
20230514 00:45:14               2       1056           45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client          124,537,856
20230514 00:46:36               2       1056           45684 88bvj239xx2ah 88bvj239xx2ah 2023-05-14 00:45:18    34818232 JDBC Thin Client            5,196,800
20230514 00:46:47               2       1056           45684 b4tcskhvduq4m b4tcskhvduq4m 2023-05-14 00:45:18    34608683 JDBC Thin Client            5,196,800
20230514 00:47:00               2       1056           45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client          368,397,312
20230514 00:47:18               2       1056           45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client          452,283,392
20230514 00:47:33               2       1056           45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8 2023-05-14 00:47:23    34063265 JDBC Thin Client          506,874,880
20230514 00:47:45               2       1056           45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client           20,007,936
20230514 00:47:57               2       1056           45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client          234,179,584
20230514 00:48:07               2       1056           45684 66c902x0hurcj 66c902x0hurcj 2023-05-14 00:48:06    36636084 JDBC Thin Client            5,262,336
20230514 00:48:17               2       1056           45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client            7,031,808
20230514 00:48:27               2       1056           45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client          246,893,568
20230514 00:48:37               2       1056           45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client          569,527,296
20230514 00:48:47               2       1056           45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client          708,267,008
20230514 00:48:57               2       1056           45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client          839,863,296
20230514 00:49:07               2       1056           45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client        1,010,650,112
20230514 00:49:17               2       1056           45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client        1,210,665,984
20230514 00:49:27               2       1056           45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client        1,461,472,256
20230514 00:49:37               2       1056           45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client        1,722,371,072
20230514 00:49:47               2       1056           45684                                                             JDBC Thin Client        2,400,275,456
20230514 00:45:14               2       1535           22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client          171,479,040
20230514 00:46:36               2       1535           22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 2023-05-14 00:45:18    34063041 JDBC Thin Client          284,856,320
20230514 00:46:47               2       1535           22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client           27,627,520
20230514 00:47:00               2       1535           22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client          150,310,912
20230514 00:47:18               2       1535           22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client          267,751,424
20230514 00:47:33               2       1535           22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 2023-05-14 00:47:27    34063277 JDBC Thin Client           59,478,016
20230514 00:47:45               2       1535           22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client          206,606,336
20230514 00:47:57               2       1535           22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client          351,637,504
20230514 00:48:07               2       1535           22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client          502,632,448
20230514 00:48:17               2       1535           22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client          536,186,880
20230514 00:48:27               2       1535           22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client          623,742,976
20230514 00:48:37               2       1535           22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client          813,076,480
20230514 00:48:47               2       1535           22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client          964,399,104
20230514 00:48:57               2       1535           22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client        1,143,640,064
20230514 00:49:07               2       1535           22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client        1,434,685,440
20230514 00:49:17               2       1535           22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client        1,592,430,592
20230514 00:49:27               2       1535           22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client        1,847,693,312
20230514 00:49:37               2       1535           22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client        2,409,795,584
...
20230514 00:49:07               2       5798           64663 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client        1,272,073,216
20230514 00:49:17               2       5798           64663 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client        1,449,348,096
20230514 00:49:27               2       5798           64663 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client        1,714,506,752
20230514 00:49:37               2       5798           64663 2cpwnwjvuzxx8 2cpwnwjvuzxx8                                 JDBC Thin Client        2,339,589,120

182 rows selected.

Note:
显示均在并发执行2cpwnwjvuzxx8 SQL时报错,该SQL时单个会话占用pga近2.5G

其中一个会话

select to_char(sample_time,'yyyymmdd hh24:mi:ss') etime,session_id,session_serial#,sql_id,event,TOP_LEVEL_SQL_ID,IN_PARSE,IN_HARD_PARSE,IN_SQL_EXECUTION,IN_PLSQL_EXECUTION,
SQL_EXEC_START,SQL_EXEC_ID,TOP_LEVEL_CALL_NAME,PGA_ALLOCATED --,TEMP_SPACE_ALLOCATED
  3  from TAB_ASH_TEMP_20230514 where session_id=1535 order by 1;

ETIME             SESSION_ID SESSION_SERIAL# SQL_ID        EVENT                          TOP_LEVEL_SQL I I I I SQL_EXEC_START      SQL_EXEC_ID TOP_LEVEL_CALL_        PGA_ALLOCATED
----------------- ---------- --------------- ------------- ------------------------------ ------------- - - - - ------------------- ----------- --------------- --------------------
20230514 00:25:28       1535           17319 5nk3zc32kfcjz gc current block 2-way         5nk3zc32kfcjz N N Y N 2023-05-14 00:25:28    36706951 V8 Bundled Exec            2,247,680
20230514 00:25:39       1535           17319 5nk3zc32kfcjz gc cr block 2-way              5nk3zc32kfcjz N N Y N 2023-05-14 00:25:38    36763597 V8 Bundled Exec            2,444,288
20230514 00:45:14       1535           22916 2cpwnwjvuzxx8 cursor: pin S wait on X        2cpwnwjvuzxx8 N N Y N                                 V8 Bundled Exec          171,479,040
20230514 00:46:36       1535           22916 2cpwnwjvuzxx8 gc current grant busy          2cpwnwjvuzxx8 N N Y N 2023-05-14 00:45:18    34063041 V8 Bundled Exec          284,856,320
20230514 00:46:47       1535           22916 2cpwnwjvuzxx8 cursor: mutex S                2cpwnwjvuzxx8 N N Y N                                 V8 Bundled Exec           27,627,520
20230514 00:47:00       1535           22916 2cpwnwjvuzxx8                                2cpwnwjvuzxx8 N N Y N                                 V8 Bundled Exec          150,310,912
20230514 00:47:18       1535           22916 2cpwnwjvuzxx8                                2cpwnwjvuzxx8 N N Y N                                 V8 Bundled Exec          267,751,424
20230514 00:47:33       1535           22916 2cpwnwjvuzxx8                                2cpwnwjvuzxx8 N N Y N 2023-05-14 00:47:27    34063277 V8 Bundled Exec           59,478,016
20230514 00:47:45       1535           22916 2cpwnwjvuzxx8 cursor: mutex X                2cpwnwjvuzxx8 N N Y N                                 V8 Bundled Exec          206,606,336
20230514 00:47:57       1535           22916 2cpwnwjvuzxx8                                2cpwnwjvuzxx8 N N Y N                                 V8 Bundled Exec          351,637,504
20230514 00:48:07       1535           22916 2cpwnwjvuzxx8                                2cpwnwjvuzxx8 N N Y N                                 V8 Bundled Exec          502,632,448
20230514 00:48:17       1535           22916 2cpwnwjvuzxx8                                2cpwnwjvuzxx8 N N Y N                                 V8 Bundled Exec          536,186,880
20230514 00:48:27       1535           22916 2cpwnwjvuzxx8 cursor: pin S wait on X        2cpwnwjvuzxx8 N N Y N                                 V8 Bundled Exec          623,742,976
20230514 00:48:37       1535           22916 2cpwnwjvuzxx8 cursor: mutex X                2cpwnwjvuzxx8 N N Y N                                 V8 Bundled Exec          813,076,480
20230514 00:48:47       1535           22916 2cpwnwjvuzxx8 cursor: mutex S                2cpwnwjvuzxx8 N N Y N                                 V8 Bundled Exec          964,399,104
20230514 00:48:57       1535           22916 2cpwnwjvuzxx8 acknowledge over PGA limit     2cpwnwjvuzxx8 N N Y N                                 V8 Bundled Exec        1,143,640,064
20230514 00:49:07       1535           22916 2cpwnwjvuzxx8 cursor: mutex X                2cpwnwjvuzxx8 N N Y N                                 V8 Bundled Exec        1,434,685,440
20230514 00:49:17       1535           22916 2cpwnwjvuzxx8                                2cpwnwjvuzxx8 N N Y N                                 V8 Bundled Exec        1,592,430,592
20230514 00:49:27       1535           22916 2cpwnwjvuzxx8                                2cpwnwjvuzxx8 N N Y N                                 V8 Bundled Exec        1,847,693,312
20230514 00:49:37       1535           22916 2cpwnwjvuzxx8 cursor: mutex X                2cpwnwjvuzxx8 N N Y N                                 V8 Bundled Exec        2,409,795,584

20 rows selected.

Note:
不是存储过程,sqlexec_start为空,IN_SQL_EXECUTION执行阶段

确认这SQL的PGA使用

  1  select to_char(sample_time,'yyyymmdd hh24:mi:ss') etime,session_id,session_serial#,sql_id,event,TOP_LEVEL_SQL_ID,IN_PARSE,IN_HARD_PARSE,IN_SQL_EXECUTION,IN_PLSQL_EXECUTION,
  2  SQL_EXEC_START,SQL_EXEC_ID,TOP_LEVEL_CALL_NAME,PGA_ALLOCATED --,TEMP_SPACE_ALLOCATED
  3* from TAB_ASH_TEMP_20230514 where sql_id='2cpwnwjvuzxx8' order by 2,1

Note:
大量SESSION和上面1535 session一样,不再附输出.

PGA使用趋势

SQL> select to_char(sample_time,'yyyymmdd hh24:mi:ss') etime,INSTANCE_NUMBER,sum(PGA_ALLOCATED)/1024/1024 mb 
from TAB_ASH_TEMP_20230514 
group by INSTANCE_NUMBER,to_char(sample_time,'yyyymmdd hh24:mi:ss')  having sum(PGA_ALLOCATED)>1024*1024*3 order by 2,1 ;


问题时间PGA确实暴增,从ASH计算需求超过了30GB。

select to_char(sample_time,'yyyymmdd hh24:mi:ss') etime,INSTANCE_NUMBER,sum(PGA_ALLOCATED)/1024/1024 mb,count(*) AAS 
from TAB_ASH_TEMP_20230514
where instance_number=2
 group by INSTANCE_NUMBER,to_char(sample_time,'yyyymmdd hh24:mi:ss')   order by 2,1 ;
 
 ETIME             INSTANCE_NUMBER         MB        AAS
----------------- --------------- ---------- ----------

20230514 00:43:14               2 69.0185547          1
20230514 00:43:24               2 69.0185547          1
20230514 00:43:34               2 69.0185547          1
20230514 00:43:44               2 69.0185547          1
20230514 00:43:54               2 69.8935547          1
20230514 00:44:04               2 70.0185547          1
20230514 00:44:14               2 70.0185547          1
20230514 00:44:24               2 70.0185547          1
20230514 00:44:44               2 404.989258         70
20230514 00:44:54               2 118.516602         25
20230514 00:45:04               2 260.943359         28
20230514 00:45:14               2 22514.5645        152
20230514 00:46:36               2 32874.1875        193
20230514 00:46:47               2  25770.332        188
20230514 00:47:00               2 33096.7656        136
20230514 00:47:18               2 35927.2803        135
20230514 00:47:33               2 38246.7305        135
20230514 00:47:45               2 35705.1396        122
20230514 00:47:57               2 40689.6309        115
20230514 00:48:07               2 40523.1152        111
20230514 00:48:17               2 41626.7842        110
20230514 00:48:27               2 33751.5313        111
20230514 00:48:37               2 28896.8965         72
20230514 00:48:47               2 26450.3477         57
20230514 00:48:57               2 22788.7998         47
20230514 00:49:07               2 23653.2305         41
20230514 00:49:17               2 23028.5977         30
20230514 00:49:27               2 22086.1943         26
20230514 00:49:37               2 17639.7178         14
20230514 00:49:47               2 6296.36719          7
20230514 00:49:57               2 1196.28418          5

Note:
活动会话也有增加。

PGA使用总量增加了,并发活动会话增加了,使用PGA 最多的TOP 20 session 问题时间点都在执行一个2cpwnwjvuzxx8 SQLID, 该SQL的从执行计划看效率很高,索引选择率好,个人感觉PGA使用高一种是少量个数但单个使用消耗PGA较多,另一种上执行变慢导致会话积压,我觉的可能更倾向于第二种, 以执行2cpwnwjvuzxx8的会话单个使用2GB, 25G的PGA limit有10几个会话增长起来就可能达到。 因为该问题持续时间短,从AWR的sqlstat粒度可能不好判断原因。 无论是当时DASH只使用最多PGA的会话在执行的SQL,还是当时活动会话在执行的TOP SQL都指向2cpwnwjvuzxx8, 这个SQL是在SQL的执行阶段,但活动会话又是在等待cursor: mutex x SQL游标解析类,而且还是high version,什么情况会在执行中产生子游标呢? 自适应共享游标Adaptive Cursor Sharing ACS,ECS

为什么游标不共享

SELECT INST_ID, SQL_ID, REASON_NAME, REASON_VALUE, COUNT(*) FROM (
 SELECT * FROM GV$SQL_SHARED_CURSOR
 UNPIVOT INCLUDE NULLS
 (REASON_VALUE
     FOR REASON_NAME IN (ACL_MISMATCH AS 'ACL_MISMATCH',ANYDATA_TRANSFORMATION AS 'ANYDATA_TRANSFORMATION',AUTH_CHECK_MISMATCH AS 'AUTH_CHECK_MISMATCH',BIND_EQUIV_FAILURE AS 'BIND_EQUIV_FAILURE',BIND_LENGTH_UPGRADEABLE AS 'BIND_LENGTH_UPGRADEABLE',BIND_MISMATCH AS 'BIND_MISMATCH',BIND_PEEKED_PQ_MISMATCH AS 'BIND_PEEKED_PQ_MISMATCH',BIND_UACS_DIFF AS 'BIND_UACS_DIFF',BUFFERED_DML_MISMATCH AS 'BUFFERED_DML_MISMATCH',CROSSEDITION_TRIGGER_MISMATCH AS 'CROSSEDITION_TRIGGER_MISMATCH',CURSOR_PARTS_MISMATCH AS 'CURSOR_PARTS_MISMATCH',DESCRIBE_MISMATCH AS 'DESCRIBE_MISMATCH',DIFF_CALL_DURN AS 'DIFF_CALL_DURN',DIFFERENT_LONG_LENGTH AS 'DIFFERENT_LONG_LENGTH',EDITION_MISMATCH AS 'EDITION_MISMATCH',EXPLAIN_PLAN_CURSOR AS 'EXPLAIN_PLAN_CURSOR',FLASHBACK_ARCHIVE_MISMATCH AS 'FLASHBACK_ARCHIVE_MISMATCH',FLASHBACK_CURSOR AS 'FLASHBACK_CURSOR',FLASHBACK_TABLE_MISMATCH AS 'FLASHBACK_TABLE_MISMATCH',FORCE_HARD_PARSE AS 'FORCE_HARD_PARSE',HASH_MATCH_FAILED AS 'HASH_MATCH_FAILED',INCOMP_LTRL_MISMATCH AS 'INCOMP_LTRL_MISMATCH',INST_DRTLD_MISMATCH AS 'INST_DRTLD_MISMATCH',INSUFF_PRIVS AS 'INSUFF_PRIVS',INSUFF_PRIVS_REM AS 'INSUFF_PRIVS_REM',LANGUAGE_MISMATCH AS 'LANGUAGE_MISMATCH',LITERAL_MISMATCH AS 'LITERAL_MISMATCH',LITREP_COMP_MISMATCH AS 'LITREP_COMP_MISMATCH',LOAD_OPTIMIZER_STATS AS 'LOAD_OPTIMIZER_STATS',LOAD_RUNTIME_HEAP_FAILED AS 'LOAD_RUNTIME_HEAP_FAILED',LOCK_USER_SCHEMA_FAILED AS 'LOCK_USER_SCHEMA_FAILED',LOGICAL_STANDBY_APPLY AS 'LOGICAL_STANDBY_APPLY',LOGMINER_SESSION_MISMATCH AS 'LOGMINER_SESSION_MISMATCH',MULTI_PX_MISMATCH AS 'MULTI_PX_MISMATCH',MV_QUERY_GEN_MISMATCH AS 'MV_QUERY_GEN_MISMATCH',MV_REWRITE_MISMATCH AS 'MV_REWRITE_MISMATCH',MV_STALEOBJ_MISMATCH AS 'MV_STALEOBJ_MISMATCH',NO_TRIGGER_MISMATCH AS 'NO_TRIGGER_MISMATCH',OPTIMIZER_MISMATCH AS 'OPTIMIZER_MISMATCH',OPTIMIZER_MODE_MISMATCH AS 'OPTIMIZER_MODE_MISMATCH',OUTLINE_MISMATCH AS 'OUTLINE_MISMATCH',OVERLAP_TIME_MISMATCH AS 'OVERLAP_TIME_MISMATCH',PDDL_ENV_MISMATCH AS 'PDDL_ENV_MISMATCH',PDML_ENV_MISMATCH AS 'PDML_ENV_MISMATCH',PLSQL_CMP_SWITCHS_DIFF AS 'PLSQL_CMP_SWITCHS_DIFF',PLSQL_DEBUG AS 'PLSQL_DEBUG',PQ_SLAVE_MISMATCH AS 'PQ_SLAVE_MISMATCH',PURGED_CURSOR AS 'PURGED_CURSOR',PX_MISMATCH AS 'PX_MISMATCH',REMOTE_MAPPING_MISMATCH AS 'REMOTE_MAPPING_MISMATCH',REMOTE_TRANS_MISMATCH AS 'REMOTE_TRANS_MISMATCH',ROLL_INVALID_MISMATCH AS 'ROLL_INVALID_MISMATCH',SLAVE_QC_MISMATCH AS 'SLAVE_QC_MISMATCH',SQL_TYPE_MISMATCH AS 'SQL_TYPE_MISMATCH',STATS_ROW_MISMATCH AS 'STATS_ROW_MISMATCH',STB_OBJECT_MISMATCH AS 'STB_OBJECT_MISMATCH',TOP_LEVEL_DDL_MISMATCH AS 'TOP_LEVEL_DDL_MISMATCH',TOP_LEVEL_RPI_CURSOR AS 'TOP_LEVEL_RPI_CURSOR',TRANSLATION_MISMATCH AS 'TRANSLATION_MISMATCH',TYPCHK_DEP_MISMATCH AS 'TYPCHK_DEP_MISMATCH',TYPECHECK_MISMATCH AS 'TYPECHECK_MISMATCH',UNBOUND_CURSOR AS 'UNBOUND_CURSOR',USE_FEEDBACK_STATS AS 'USE_FEEDBACK_STATS',USER_BIND_PEEK_MISMATCH AS 'USER_BIND_PEEK_MISMATCH'))
 ORDER BY 2,1)
 WHERE SQL_ID='&sqlid'
 GROUP BY INST_ID, SQL_ID, REASON_NAME, REASON_VALUE
 HAVING REASON_VALUE='Y'
 ORDER BY COUNT(*) DESC;

note:
查询上面的问题SQL version count 高和no shared 原因为BIND_EQUIV_FAILURE, 这是ACS(实际上是ECS)的典型,11g 12c 19c最佳实践一直在禁用该特性, 12.2 由于 Bind_equiv_failure 引发 SQL 不能共享进而造成 Cursor Mutex: x (Doc ID 2610645.1)提到的bug也是ACS导致的.

查询v$sql的is_bind_sensitive,is_bind_aware 并且也都为Y。ACS问题在13年左右第一次遇到时也是一个1秒的SQL执行时间突然变成了4分钟。

 

禁用ACS

alter system set "_optimizer_use_feedback"=false;
alter system set "_optimizer_adaptive_cursor_sharing"=false;
alter system set "_optimizer_extended_cursor_sharing"=none ;
alter system set "_optimizer_extended_cursor_sharing_rel"=none;

note:
另外禁用绑定变量窥探(set “_optim_peek_user_binds”=false)也可以变象实现禁用ACS.

打赏

, ,

对不起,这篇文章暂时关闭评论。