首页 » ORACLE 9i-23c » Troubleshooting Oracle 19c hang wait ‘enq: TX – contention’ when RECO distributed Transaction recovery

Troubleshooting Oracle 19c hang wait ‘enq: TX – contention’ when RECO distributed Transaction recovery

Oracle 19c RAC实例大量会话hang,等待事件为’enq: TX – contention’, 做hanganalyze 堵塞进程为RECO, ‘buffer busy waits'<=’enq: TX – contention’ (cycle) RECO与前台进程形成死锁, 分布式事务表dba_2pc_pendings无记录。简单记录处理过程.

db alert log

ARC2 (PID:110273): Archived Log entry 73808 added for T-1.S-22324 ID 0x20b85f1e LAD:1
2023-02-26T15:33:53.503393+08:00
Error 22 trapped in 2PC on transaction 2259.15.2626285. Cleaning up.  <<<<<<<<<<<
Error stack returned to user:
ORA-02054: transaction 2259.15.2626285 in-doubt
ORA-00022: invalid session ID; access denied
ORA-02063: preceding line from SXXX1 ---(shared public database link)

2023-02-26T15:33:53.521552+08:00
DISTRIB TRAN Txxxx.anbob.com.e18a8b02.2259.15.2626285
  is local tran 2259.15.2626285 (hex=8d3.0f.2812ed)
  insert pending prepared tran, scn=17889696968055 (hex=0x0000104545086d77)
2023-02-26T15:33:54.081770+08:00
DISTRIB TRAN Txxxx.anbob.com.e18a8b02.2259.15.2626285
  is local tran 2259.15.2626285 (hex=8d3.0f.2812ed)
  change pending prepared tran, scn=17889696968055 (hex=0x0000104545086d77)
  to     pending collecting tran, scn=17889696968055 (hex=0x0000104545086d77) 

hanganalyze trace

Chains most likely to have caused the hang:
 [a] Chain 1 Signature: 'buffer busy waits'<='enq: TX - contention' (cycle)
     Chain 1 Signature Hash: 0x14a7051f


Cycles:
 
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (anbob.anbob1)
                   os id: 88106
              process id: 4540, oracle@anbob1
              session id: 11499
        session serial #: 50576
             module name: 0 (JDBC Thin ClientTNS V1-V3)TNS V1-V3))-V3)-V3)V3))
    }
    is waiting for 'enq: TX - contention' with wait info:
    {
                      p1: 'name|mode'=0x54580004
                      p2: 'usn<<16 | slot'=0x8d3000f
                      p3: 'sequence'=0x2812ed
            time in wait: 6.424142 sec
      heur. time in wait: 13.050695 sec
           timeout after: 1.575858 sec
                 wait id: 2868
                blocking: 182 sessions
          current sql_id: 2609502947
             current sql: insert into SP_.....) values( 310,  :1 ,:2 ,:3 ,:4 ,:5 ,:6 ) 
             short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+192<-ksliwat()+2192<-kslwaitctx()+200<-kjusuc()+5742<-ksipgetctxia()+1968<-ksqcmi()+2757<-ksqgtlctx()+6620<-ksqgelctx()+838<-ktuGetTxForXid()+324<-ktcwit1()+8023<-ktuGetCommitTimesMain_Int()+1710<-ktuGetCommitTimesMain()+542<-ktuGetCommitTimes()+149<-ktbgcl1()+2645<-ktbgfi()+10962<-kdiins1()+40025<-kdiinsp0()+129<-kauxsin()+2041<-qesltcLoadIndexList()+873<-qerltcNoKdtBufferedInsRowCBK()+361<-qerltcSingleRowLoad(
            wait history:
              * time between current wait and wait #1: 0.000060 sec
              1.       event: 'enq: TX - contention'
                 time waited: 3.998891 sec
                     wait id: 2867             p1: 'name|mode'=0x54580004
                                               p2: 'usn<<16 | slot'=0x8d3000f
                                               p3: 'sequence'=0x2812ed
              * time between wait #1 and #2: 0.000051 sec
              2.       event: 'enq: TX - contention'
                 time waited: 2.001008 sec
                     wait id: 2866             p1: 'name|mode'=0x54580004
                                               p2: 'usn<<16 | slot'=0x8d3000f
                                               p3: 'sequence'=0x2812ed
              * time between wait #2 and #3: 0.000044 sec
              3.       event: 'enq: TX - contention'
                 time waited: 0.626499 sec
                     wait id: 2865             p1: 'name|mode'=0x54580004
                                               p2: 'usn<<16 | slot'=0x8d3000f p3: 'sequence'=0x2812ed } and is blocked by => Oracle session identified by:
    {
                instance: 1 (anbob.anbob1)
                   os id: 92451
              process id: 75, oracle@anbob1 (RECO)
              session id: 16576
        session serial #: 41963
    }
    which is waiting for 'buffer busy waits' with wait info:
    {
                      p1: 'file#'=0x1b
                      p2: 'block#'=0x2766a
                      p3: 'class#'=0x1
            time in wait: 0.670083 sec
      heur. time in wait: 31 min 21 sec
           timeout after: never
                 wait id: 49035
                blocking: 182 sessions
          current sql_id: 0
             current sql: 
             short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+192<-ksliwat()+2192<-kslwaitctx()+200<-kcbzwb()+2172<-kcbget()+42000<-ktbxchg()+151<-kdifind()+1016<-kdifind4undo()+52<-kdiulk()+15754<-kcoubk()+337<-ktundo()+1816<-kturCurrBackoutOneChg()+1230<-kturAbortInDoubtTxnMain()+803<-kturAbortInDoubtTxn()+952<-ktdabt()+383<-k2lrab()+56<-k2vrec()+800<-k2vdrvMain()+5992<-k2vdrv()+448<-ksb_act_run_int()+117<-ksb_act_run()+130<-ksbabs()+746<-ksbrdp()+1123<-opirip()+541<-opi
            wait history:
              * time between current wait and wait #1: 0.000119 sec
              1.       event: 'buffer busy waits'
                 time waited: 0.999818 sec
                     wait id: 49034            p1: 'file#'=0x1b
                                               p2: 'block#'=0x2766a
                                               p3: 'class#'=0x1
              * time between wait #1 and #2: 0.000181 sec
              2.       event: 'buffer busy waits'
                 time waited: 0.999882 sec
                     wait id: 49033            p1: 'file#'=0x1b
                                               p2: 'block#'=0x2766a
                                               p3: 'class#'=0x1
              * time between wait #2 and #3: 0.000125 sec
              3.       event: 'buffer busy waits'
                 time waited: 0.999832 sec
                     wait id: 49032            p1: 'file#'=0x1b
                                               p2: 'block#'=0x2766a
                                               p3: 'class#'=0x1
    }
    and is blocked by the session at the start of the chain.

---大量会话被11499堵塞

	行 4522:     and is blocked by 'instance: 1, os id: 88106, session id: 11499',
	行 4574:     and is blocked by 'instance: 1, os id: 88106, session id: 11499',
	行 4727:     and is blocked by 'instance: 1, os id: 88106, session id: 11499',
	行 4779:     and is blocked by 'instance: 1, os id: 88106, session id: 11499',
	行 4831:     and is blocked by 'instance: 1, os id: 88106, session id: 11499',
	行 4987:     and is blocked by 'instance: 1, os id: 88106, session id: 11499',
	行 5038:     and is blocked by 'instance: 1, os id: 88106, session id: 11499',
	行 5090:     and is blocked by 'instance: 1, os id: 88106, session id: 11499',
	行 5138:     and is blocked by 'instance: 1, os id: 88106, session id: 11499',
	行 5186:     and is blocked by 'instance: 1, os id: 88106, session id: 11499',
	行 5291:     and is blocked by 'instance: 1, os id: 88106, session id: 11499',
	行 5395:     and is blocked by 'instance: 1, os id: 88106, session id: 11499',
	行 5447:     and is blocked by 'instance: 1, os id: 88106, session id: 11499',
	行 5499:     and is blocked by 'instance: 1, os id: 88106, session id: 11499',
	行 5549:     and is blocked by 'instance: 1, os id: 88106, session id: 11499',
	行 5601:     and is blocked by 'instance: 1, os id: 88106, session id: 11499',
。。。


Chain 1 Signature: 'buffer busy waits'<='enq: TX - contention' (cycle)


[11094]/1/11095/42517/0x9a538e200/91023/NLEAF/[11498]
[11498]/1/11499/50576/0xa05321e20/88106/IN_HANG/[16575]
...

Oradebug command 'hanganalyze 3' console output: 
Cycle 1: (1/16576)--(1/11499)

就是通过X$KTUXE中可以定位到dead prepared transaction,但在dba_2pc_pending视图中查不到

SQL> select * from x$ktuxe where KTUXESTA!='INACTIVE' and KTUXECFL like '%DEAD%';

ADDR INDX INST_ID CON_ID KTUXEUSN KTUXESLT KTUXESQN KTUXERDBF KTUXERDBB KTUXESCNB KTUXESCNW KTUXESCN KTUXESTA KTUXECFL KTUXEUEL KTUXEDDBF KTUXEDDBB KTUXEPUSN KTUXEPSLT KTUXEPSQN KTUXESIZ KXIDEXTFLAG
---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------- ------------------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- -----------
00007FFFF25B5600 76885 1 0 2259 15 2626285 3 405500 1158180215 4165 1.7890E+13 PREPARED SCO|COL|REV|DEAD 6 3 405500 0 0 0 2 0

SQL> select * from dba_2pc_pending;
no rows selected 

Note:
判断2pc 远程事务已完成,或SESSION已断开,仅剩local 事务需要处理。

解决方案

参考 How To Resolve Stranded DBA_2PC_PENDING Entries [ID 401302.1]

1、因为RECO为非致命进程,可以KILL处理,关于致命进程,可以查找我之前的blog.

kill reco
2,手动补分布式事务记录,回滚Local事务

alter system disable distributed recovery;

insert into pending_trans$ (
LOCAL_TRAN_ID,
GLOBAL_TRAN_FMT,
GLOBAL_ORACLE_ID,
STATE,
STATUS,
SESSION_VECTOR,
RECO_VECTOR,
TYPE#,
FAIL_TIME,
RECO_TIME)
values( '2259.15.2626285', /* <== Replace this with your local tran id */
306206, /* */
'XXXXXXX.12345.1.2.3', /* These values can be used without any */
'prepared','P', /* modification. Most of the values are */
hextoraw( '00000001' ), /* constant. */
hextoraw( '00000000' ), /* */
0, sysdate, sysdate );

insert into pending_sessions$
values( '2259.15.2626285',/* <==Replace only this with your local tran id */
1, hextoraw('05004F003A1500000104'),
'C', 0, 30258592, '',
146
);

commit;

rollback force '2259.15.2626285';
--or
--commit force '2259.15.2626285';

delete from pending_trans$ where local_tran_id='2259.15.2626285';
delete from pending_sessions$ where local_tran_id='2259.15.2626285';
commit;

这类操作较多,可以参考MOS 自己批一个类型于COE的脚本,一键输出解决方案。同事写了一个供参考

CREATE OR REPLACE PROCEDURE sys.clean_dead_transactions AS
  -- 声明变量来存储死事务 COUNT(*) 的值和死事务的信息
  v_count NUMBER(10);
  v_count1 number(10);
  v_trans_id VARCHAR2(100);
  v_ktuxesta x$ktuxe.ktuxesta%TYPE;
  v_ktuxecfl x$ktuxe.ktuxecfl%TYPE;
  v_ktuxesiz x$ktuxe.ktuxesiz%TYPE;
BEGIN
  -- 执行死事务 COUNT(*) 查询
  SELECT COUNT(*) INTO v_count FROM x$ktuxe WHERE KTUXESTA='PREPARED' AND KTUXECFL like'%DEAD';

  -- 判断 COUNT(*) 是否为 0,如果是则退出
  IF v_count = 0 THEN
    DBMS_OUTPUT.PUT_LINE('No dead trans!');
    RETURN;
  END IF;

  -- 如果 COUNT(*) 不为 0,继续执行其他操作
  DBMS_OUTPUT.PUT_LINE('=====================================================================================');
  DBMS_OUTPUT.PUT_LINE('Found ' || v_count || ' rows!');

  -- 执行查询并将结果存储到变量中
  FOR rec IN (SELECT KTUXEUSN || '.' || KTUXESLT || '.' || KTUXESQN trans_id, KTUXESTA, KTUXECFL, KTUXESIZ
              FROM x$ktuxe
              WHERE KTUXESTA='PREPARED' AND KTUXECFL like'%DEAD') 
  LOOP
    v_trans_id := rec.trans_id;
    v_ktuxesta := rec.KTUXESTA;
    v_ktuxecfl := rec.KTUXECFL;
    v_ktuxesiz := rec.KTUXESIZ;
    
    -- 输出死事务信息
    DBMS_OUTPUT.PUT_LINE('local_trans_id: ' || v_trans_id || ', Status: ' || v_ktuxesta ||
                         ', Flags: ' || v_ktuxecfl || ', KTUXESIZ: ' || v_ktuxesiz);
    DBMS_OUTPUT.PUT_LINE('=====================================================================================');
    -- 判断 local_trans_id 是否在 dba_2pc_pending 中不存在
    select count(*) into v_count1 from dba_2pc_pending where LOCAL_TRAN_ID = v_trans_id;
    IF v_count1 <> 0 THEN
        DBMS_OUTPUT.PUT_LINE('This dead trans exists in dba_2pc_pending!');
        RETURN;
    END IF;
    
    -- 如果 COUNT1(*) 为 0,继续执行其他操作
    DBMS_OUTPUT.PUT_LINE('This dead trans does not exist in dba_2pc_pending');

    -- 输出清理语句
    DBMS_OUTPUT.PUT_LINE('=====================================================================================');
    DBMS_OUTPUT.PUT_LINE('alter system disable distributed recovery;');
    DBMS_OUTPUT.PUT_LINE('INSERT INTO pending_trans (LOCAL_TRAN_ID, GLOBAL_TRAN_FMT, GLOBAL_ORACLE_ID, STATE, STATUS, SESSION_VECTOR, RECO_VECTOR, TYPE#, FAIL_TIME, RECO_TIME) VALUES (''' || v_trans_id || ''', 306206, ''XXXXXXX.12345.1.2.3'', ''prepared'', ''P'', hextoraw(''00000001''), hextoraw(''00000000''), 0, sysdate, sysdate);');
    DBMS_OUTPUT.PUT_LINE('INSERT INTO pending_sessions$ VALUES (''' || v_trans_id || ''', 1, hextoraw(''05004F003A1500000104''), ''C'', 0, 30258592, '''', 146);');
    DBMS_OUTPUT.PUT_LINE('COMMIT;');
    DBMS_OUTPUT.PUT_LINE('=====================================================================================');
    DBMS_OUTPUT.PUT_LINE('rollback force ''' || v_trans_id || ''';');
    DBMS_OUTPUT.PUT_LINE('OR');
    DBMS_OUTPUT.PUT_LINE('commit force ''' || v_trans_id || ''';');
    DBMS_OUTPUT.PUT_LINE('delete from pending_trans$ where local_tran_id= ''' || v_trans_id ||''';');
    DBMS_OUTPUT.PUT_LINE('delete from pending_sessions$ where local_tran_id= ''' || v_trans_id ||''';');
    DBMS_OUTPUT.PUT_LINE('commit;');
    DBMS_OUTPUT.PUT_LINE('alter system enable distributed recovery ;');
    DBMS_OUTPUT.PUT_LINE('=====================================================================================');
  END LOOP;
END;
/

打赏

, ,

对不起,这篇文章暂时关闭评论。