Oracle 19c RAC实例大量会话hang,等待事件为’enq: TX – contention’, 做hanganalyze 堵塞进程为RECO, ‘buffer busy waits'<=’enq: TX – contention’ (cycle) RECO与前台进程形成死锁, 分布式事务表dba_2pc_pendings无记录。简单记录处理过程.
db alert log
ARC2 (PID:110273): Archived Log entry 73808 added for T-1.S-22324 ID 0x20b85f1e LAD:1 2023-02-26T15:33:53.503393+08:00 Error 22 trapped in 2PC on transaction 2259.15.2626285. Cleaning up. <<<<<<<<<<< Error stack returned to user: ORA-02054: transaction 2259.15.2626285 in-doubt ORA-00022: invalid session ID; access denied ORA-02063: preceding line from SXXX1 ---(shared public database link) 2023-02-26T15:33:53.521552+08:00 DISTRIB TRAN Txxxx.anbob.com.e18a8b02.2259.15.2626285 is local tran 2259.15.2626285 (hex=8d3.0f.2812ed) insert pending prepared tran, scn=17889696968055 (hex=0x0000104545086d77) 2023-02-26T15:33:54.081770+08:00 DISTRIB TRAN Txxxx.anbob.com.e18a8b02.2259.15.2626285 is local tran 2259.15.2626285 (hex=8d3.0f.2812ed) change pending prepared tran, scn=17889696968055 (hex=0x0000104545086d77) to pending collecting tran, scn=17889696968055 (hex=0x0000104545086d77)
hanganalyze trace
Chains most likely to have caused the hang:
[a] Chain 1 Signature: 'buffer busy waits'<='enq: TX - contention' (cycle)
Chain 1 Signature Hash: 0x14a7051f
Cycles:
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (anbob.anbob1)
os id: 88106
process id: 4540, oracle@anbob1
session id: 11499
session serial #: 50576
module name: 0 (JDBC Thin ClientTNS V1-V3)TNS V1-V3))-V3)-V3)V3))
}
is waiting for 'enq: TX - contention' with wait info:
{
p1: 'name|mode'=0x54580004
p2: 'usn<<16 | slot'=0x8d3000f
p3: 'sequence'=0x2812ed
time in wait: 6.424142 sec
heur. time in wait: 13.050695 sec
timeout after: 1.575858 sec
wait id: 2868
blocking: 182 sessions
current sql_id: 2609502947
current sql: insert into SP_.....) values( 310, :1 ,:2 ,:3 ,:4 ,:5 ,:6 )
short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+192<-ksliwat()+2192<-kslwaitctx()+200<-kjusuc()+5742<-ksipgetctxia()+1968<-ksqcmi()+2757<-ksqgtlctx()+6620<-ksqgelctx()+838<-ktuGetTxForXid()+324<-ktcwit1()+8023<-ktuGetCommitTimesMain_Int()+1710<-ktuGetCommitTimesMain()+542<-ktuGetCommitTimes()+149<-ktbgcl1()+2645<-ktbgfi()+10962<-kdiins1()+40025<-kdiinsp0()+129<-kauxsin()+2041<-qesltcLoadIndexList()+873<-qerltcNoKdtBufferedInsRowCBK()+361<-qerltcSingleRowLoad(
wait history:
* time between current wait and wait #1: 0.000060 sec
1. event: 'enq: TX - contention'
time waited: 3.998891 sec
wait id: 2867 p1: 'name|mode'=0x54580004
p2: 'usn<<16 | slot'=0x8d3000f
p3: 'sequence'=0x2812ed
* time between wait #1 and #2: 0.000051 sec
2. event: 'enq: TX - contention'
time waited: 2.001008 sec
wait id: 2866 p1: 'name|mode'=0x54580004
p2: 'usn<<16 | slot'=0x8d3000f
p3: 'sequence'=0x2812ed
* time between wait #2 and #3: 0.000044 sec
3. event: 'enq: TX - contention'
time waited: 0.626499 sec
wait id: 2865 p1: 'name|mode'=0x54580004
p2: 'usn<<16 | slot'=0x8d3000f p3: 'sequence'=0x2812ed } and is blocked by => Oracle session identified by:
{
instance: 1 (anbob.anbob1)
os id: 92451
process id: 75, oracle@anbob1 (RECO)
session id: 16576
session serial #: 41963
}
which is waiting for 'buffer busy waits' with wait info:
{
p1: 'file#'=0x1b
p2: 'block#'=0x2766a
p3: 'class#'=0x1
time in wait: 0.670083 sec
heur. time in wait: 31 min 21 sec
timeout after: never
wait id: 49035
blocking: 182 sessions
current sql_id: 0
current sql:
short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+192<-ksliwat()+2192<-kslwaitctx()+200<-kcbzwb()+2172<-kcbget()+42000<-ktbxchg()+151<-kdifind()+1016<-kdifind4undo()+52<-kdiulk()+15754<-kcoubk()+337<-ktundo()+1816<-kturCurrBackoutOneChg()+1230<-kturAbortInDoubtTxnMain()+803<-kturAbortInDoubtTxn()+952<-ktdabt()+383<-k2lrab()+56<-k2vrec()+800<-k2vdrvMain()+5992<-k2vdrv()+448<-ksb_act_run_int()+117<-ksb_act_run()+130<-ksbabs()+746<-ksbrdp()+1123<-opirip()+541<-opi
wait history:
* time between current wait and wait #1: 0.000119 sec
1. event: 'buffer busy waits'
time waited: 0.999818 sec
wait id: 49034 p1: 'file#'=0x1b
p2: 'block#'=0x2766a
p3: 'class#'=0x1
* time between wait #1 and #2: 0.000181 sec
2. event: 'buffer busy waits'
time waited: 0.999882 sec
wait id: 49033 p1: 'file#'=0x1b
p2: 'block#'=0x2766a
p3: 'class#'=0x1
* time between wait #2 and #3: 0.000125 sec
3. event: 'buffer busy waits'
time waited: 0.999832 sec
wait id: 49032 p1: 'file#'=0x1b
p2: 'block#'=0x2766a
p3: 'class#'=0x1
}
and is blocked by the session at the start of the chain.
---大量会话被11499堵塞
行 4522: and is blocked by 'instance: 1, os id: 88106, session id: 11499',
行 4574: and is blocked by 'instance: 1, os id: 88106, session id: 11499',
行 4727: and is blocked by 'instance: 1, os id: 88106, session id: 11499',
行 4779: and is blocked by 'instance: 1, os id: 88106, session id: 11499',
行 4831: and is blocked by 'instance: 1, os id: 88106, session id: 11499',
行 4987: and is blocked by 'instance: 1, os id: 88106, session id: 11499',
行 5038: and is blocked by 'instance: 1, os id: 88106, session id: 11499',
行 5090: and is blocked by 'instance: 1, os id: 88106, session id: 11499',
行 5138: and is blocked by 'instance: 1, os id: 88106, session id: 11499',
行 5186: and is blocked by 'instance: 1, os id: 88106, session id: 11499',
行 5291: and is blocked by 'instance: 1, os id: 88106, session id: 11499',
行 5395: and is blocked by 'instance: 1, os id: 88106, session id: 11499',
行 5447: and is blocked by 'instance: 1, os id: 88106, session id: 11499',
行 5499: and is blocked by 'instance: 1, os id: 88106, session id: 11499',
行 5549: and is blocked by 'instance: 1, os id: 88106, session id: 11499',
行 5601: and is blocked by 'instance: 1, os id: 88106, session id: 11499',
。。。
Chain 1 Signature: 'buffer busy waits'<='enq: TX - contention' (cycle)
[11094]/1/11095/42517/0x9a538e200/91023/NLEAF/[11498]
[11498]/1/11499/50576/0xa05321e20/88106/IN_HANG/[16575]
...
Oradebug command 'hanganalyze 3' console output:
Cycle 1: (1/16576)--(1/11499)
就是通过X$KTUXE中可以定位到dead prepared transaction,但在dba_2pc_pending视图中查不到
SQL> select * from x$ktuxe where KTUXESTA!='INACTIVE' and KTUXECFL like '%DEAD%'; ADDR INDX INST_ID CON_ID KTUXEUSN KTUXESLT KTUXESQN KTUXERDBF KTUXERDBB KTUXESCNB KTUXESCNW KTUXESCN KTUXESTA KTUXECFL KTUXEUEL KTUXEDDBF KTUXEDDBB KTUXEPUSN KTUXEPSLT KTUXEPSQN KTUXESIZ KXIDEXTFLAG ---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------- ------------------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------- 00007FFFF25B5600 76885 1 0 2259 15 2626285 3 405500 1158180215 4165 1.7890E+13 PREPARED SCO|COL|REV|DEAD 6 3 405500 0 0 0 2 0 SQL> select * from dba_2pc_pending; no rows selected
Note:
判断2pc 远程事务已完成,或SESSION已断开,仅剩local 事务需要处理。
解决方案
参考 How To Resolve Stranded DBA_2PC_PENDING Entries [ID 401302.1]
1、因为RECO为非致命进程,可以KILL处理,关于致命进程,可以查找我之前的blog.
kill reco
2,手动补分布式事务记录,回滚Local事务
alter system disable distributed recovery;
insert into pending_trans$ (
LOCAL_TRAN_ID,
GLOBAL_TRAN_FMT,
GLOBAL_ORACLE_ID,
STATE,
STATUS,
SESSION_VECTOR,
RECO_VECTOR,
TYPE#,
FAIL_TIME,
RECO_TIME)
values( '2259.15.2626285', /* <== Replace this with your local tran id */
306206, /* */
'XXXXXXX.12345.1.2.3', /* These values can be used without any */
'prepared','P', /* modification. Most of the values are */
hextoraw( '00000001' ), /* constant. */
hextoraw( '00000000' ), /* */
0, sysdate, sysdate );
insert into pending_sessions$
values( '2259.15.2626285',/* <==Replace only this with your local tran id */
1, hextoraw('05004F003A1500000104'),
'C', 0, 30258592, '',
146
);
commit;
rollback force '2259.15.2626285';
--or
--commit force '2259.15.2626285';
delete from pending_trans$ where local_tran_id='2259.15.2626285';
delete from pending_sessions$ where local_tran_id='2259.15.2626285';
commit;
这类操作较多,可以参考MOS 自己批一个类型于COE的脚本,一键输出解决方案。同事写了一个供参考
CREATE OR REPLACE PROCEDURE sys.clean_dead_transactions AS
-- 声明变量来存储死事务 COUNT(*) 的值和死事务的信息
v_count NUMBER(10);
v_count1 number(10);
v_trans_id VARCHAR2(100);
v_ktuxesta x$ktuxe.ktuxesta%TYPE;
v_ktuxecfl x$ktuxe.ktuxecfl%TYPE;
v_ktuxesiz x$ktuxe.ktuxesiz%TYPE;
BEGIN
-- 执行死事务 COUNT(*) 查询
SELECT COUNT(*) INTO v_count FROM x$ktuxe WHERE KTUXESTA='PREPARED' AND KTUXECFL like'%DEAD';
-- 判断 COUNT(*) 是否为 0,如果是则退出
IF v_count = 0 THEN
DBMS_OUTPUT.PUT_LINE('No dead trans!');
RETURN;
END IF;
-- 如果 COUNT(*) 不为 0,继续执行其他操作
DBMS_OUTPUT.PUT_LINE('=====================================================================================');
DBMS_OUTPUT.PUT_LINE('Found ' || v_count || ' rows!');
-- 执行查询并将结果存储到变量中
FOR rec IN (SELECT KTUXEUSN || '.' || KTUXESLT || '.' || KTUXESQN trans_id, KTUXESTA, KTUXECFL, KTUXESIZ
FROM x$ktuxe
WHERE KTUXESTA='PREPARED' AND KTUXECFL like'%DEAD')
LOOP
v_trans_id := rec.trans_id;
v_ktuxesta := rec.KTUXESTA;
v_ktuxecfl := rec.KTUXECFL;
v_ktuxesiz := rec.KTUXESIZ;
-- 输出死事务信息
DBMS_OUTPUT.PUT_LINE('local_trans_id: ' || v_trans_id || ', Status: ' || v_ktuxesta ||
', Flags: ' || v_ktuxecfl || ', KTUXESIZ: ' || v_ktuxesiz);
DBMS_OUTPUT.PUT_LINE('=====================================================================================');
-- 判断 local_trans_id 是否在 dba_2pc_pending 中不存在
select count(*) into v_count1 from dba_2pc_pending where LOCAL_TRAN_ID = v_trans_id;
IF v_count1 <> 0 THEN
DBMS_OUTPUT.PUT_LINE('This dead trans exists in dba_2pc_pending!');
RETURN;
END IF;
-- 如果 COUNT1(*) 为 0,继续执行其他操作
DBMS_OUTPUT.PUT_LINE('This dead trans does not exist in dba_2pc_pending');
-- 输出清理语句
DBMS_OUTPUT.PUT_LINE('=====================================================================================');
DBMS_OUTPUT.PUT_LINE('alter system disable distributed recovery;');
DBMS_OUTPUT.PUT_LINE('INSERT INTO pending_trans (LOCAL_TRAN_ID, GLOBAL_TRAN_FMT, GLOBAL_ORACLE_ID, STATE, STATUS, SESSION_VECTOR, RECO_VECTOR, TYPE#, FAIL_TIME, RECO_TIME) VALUES (''' || v_trans_id || ''', 306206, ''XXXXXXX.12345.1.2.3'', ''prepared'', ''P'', hextoraw(''00000001''), hextoraw(''00000000''), 0, sysdate, sysdate);');
DBMS_OUTPUT.PUT_LINE('INSERT INTO pending_sessions$ VALUES (''' || v_trans_id || ''', 1, hextoraw(''05004F003A1500000104''), ''C'', 0, 30258592, '''', 146);');
DBMS_OUTPUT.PUT_LINE('COMMIT;');
DBMS_OUTPUT.PUT_LINE('=====================================================================================');
DBMS_OUTPUT.PUT_LINE('rollback force ''' || v_trans_id || ''';');
DBMS_OUTPUT.PUT_LINE('OR');
DBMS_OUTPUT.PUT_LINE('commit force ''' || v_trans_id || ''';');
DBMS_OUTPUT.PUT_LINE('delete from pending_trans$ where local_tran_id= ''' || v_trans_id ||''';');
DBMS_OUTPUT.PUT_LINE('delete from pending_sessions$ where local_tran_id= ''' || v_trans_id ||''';');
DBMS_OUTPUT.PUT_LINE('commit;');
DBMS_OUTPUT.PUT_LINE('alter system enable distributed recovery ;');
DBMS_OUTPUT.PUT_LINE('=====================================================================================');
END LOOP;
END;
/