首页 » ORACLE » How to release still “killed“ status session in v$session? (释放killed的session) (四)

How to release still “killed“ status session in v$session? (释放killed的session) (四)

 How to release still “killed“ status session in v$session? (释放killed的session) (一)
How to release still “killed“ status session in v$session? (释放killed的session) (二)
How to release still “killed“ status session in v$session? (释放killed的session) (三)

继续一个killed无法释放的案例, 环境是11.2.0.4 2nodes-RAC on SELS11, 开始是数据库突然出现了非常高的负载wait event是“enq: SV – contentio”,“row cache lock”,“enq: SQ – contention ”,  是与sequnece相关,SQ较为常见不再描述, 当多实例争用sequence时,如果sequence是ORDER and NOCACHE主要的wait event 是row cache lock, 当sequence是cached通常出现的是SV。 因为这个事件KILL了一批进程,后期发现有个进程始终是KILLED状态,并且同时影响后期的AWR都无法生成。

SQL>@ase

USERNAME          SID EVENT                MACHINE    MODULE               STATUS   LAST_CALL_ET SQL_ID          WAI_SECINW ROW_WAIT_OBJ# SQLTEXT                        BS          CH# OSUSER     HEX
---------- ---------- -------------------- ---------- -------------------- -------- ------------ --------------- ---------- ------------- ------------------------------ ---------- ---- ---------- ---------
PATROLDB          337 library cache: mutex kdechan1   SQL*Plus             ACTIVE          18858 2kxmfvbpdkbj4   0:469             663378 select 'DATAPOINT ' bpb, sql_t 1:4851        0 itmuser      1000000
NGESHOP          4851 On CPU / runqueue    qdtza1     oracle               KILLED          95113 axyfr1t8k6xy7   -1:95111          128735                                :               grid

SQL> @usid 4851

USERNAME                SID                 AUDSID OSUSER           MACHINE            PROGRAM              SPID             OPID CPID                     SQL_ID           HASH_VALUE   LASTCALL STATUS
   SADDR            PADDR            TADDR            LOGON_TIME
----------------------- -------------- ----------- ---------------- ------------------ -------------------- -------------- ------ ------------------------ --------------- ----------- ---------- ------
-- ---------------- ---------------- ---------------- -----------------
NGESHOP                  '4851,22847'    497193017 grid             qdtza1             (TNS V1-V3)          33793             192 21244                    axyfr1t8k6xy7    1361278919      20965 KILLED
   0000000B31C705F0 0000000B5124B188 0000000B2AB94178 20190123 07:00:54

Note:
当前SESSION id 4851是KILLED状态,并且堵塞了其它会话。当然根据会话信息我们确认了客户端是一个dblink调用。

SQL> @st dba_2pc_pending

no rows selected

SQL> SELECT /*+ ORDERED */
  2        SUBSTR (s.ksusemnm, 1, 10) || '-' || SUBSTR (s.ksusepid, 1, 10)
  3            "ORIGIN",
  4         SUBSTR (g.K2GTITID_ORA, 1, 35) "GTXID",
  5         SUBSTR (s.indx, 1, 4) || '.' || SUBSTR (s.ksuseser, 1, 5) "LSESSION",
  4         SUBSTR (g.K2GTITID_ORA, 1, 35) "GTXID",
  5         SUBSTR (s.indx, 1, 4) || '.' || SUBSTR (s.ksuseser, 1, 5) "LSESSION",
  6         s2.username,
  7         SUBSTR (
  8            DECODE (
  9               BITAND (ksuseidl, 11),
 10               1, 'ACTIVE',
 11               0, DECODE (BITAND (ksuseflg, 4096), 0, 'INACTIVE', 'CACHED'),
 12               2, 'SNIPED',
 13               3, 'SNIPED',
 14               'KILLED'),
 15            1,
 16            10)
 17            "Status",
 18         SUBSTR (s2.event, 1, 10) "WAITING"
 19    FROM x$k2gte g,
 20         x$ktcxb t,
 21         x$ksuse s,
 22         v$session s2
 23   WHERE
 24  g.K2GTDXCB = t.ktcxbxba
 25         AND g.K2GTDSES = t.ktcxbses
 26         AND s.addr = g.K2GTDSES
 27         AND s2.sid = s.indx;

ORIGIN                                    GTXID                                                                  LSESSION            USERNAME   Status           WAITING
----------------------------------------- ---------------------------------------------------------------------- ------------------- ---------- ---------------- --------------------
qdtza1-21244                              UCISA.HEBEI.MOBILE.COM.53c50444.361                                    4851.22847          NGESHOP    KILLED           SQL*Net me

# Pmon trace

*** 2019-01-23 18:10:42.356
KGX Atomic Operation Log 0xb97f5f4b8
 Mutex 0xb68781710(4851, 0) idn 565cf76d oper GET_EXCL(5)
 Library Cache uid 303 efd 9 whr 76 slp 12366
 oper=0 pt1=0xb687815d0 pt2=(nil) pt3=(nil)
 pt4=(nil) pt5=(nil) ub4=0

*** 2019-01-23 18:10:52.371
found process 0xb5124b188 pid=192 serial=42 ospid = 33793 dead
KGX cleanup...
KGX Atomic Operation Log 0x2a4ed9008
 Mutex 0xb68781710(4851, 0) idn 565cf76d oper EXCL(6)
 Library Cache uid 4851 efd 13 whr 98 slp 0
 oper=12 pt1=0xb687815d0 pt2=0x197f140b0 pt3=(nil)
 pt4=(nil) pt5=(nil) ub4=0

LibraryHandle:  Address=0xb687815d0 Hash=565cf76d LockMode=0 PinMode=0 LoadLockMode=0 Status=VALD 
  ObjectName:  Name=SELECT COUNT(*) FROM "NGESHOP"."ES_ORDER_*********" "A1" WHERE "A1"."ORDER_ID"='aaf3ea904147492fbd5be1dd14d096b8' 
    FullHashValue=243baddfd5dc6ffdb59840a3565cf76d Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=1448933229 OwnerIdn=246 

*** 2019-01-23 18:13:22.492
KGX cleanup...
KGX Atomic Operation Log 0xb97f5f4b8
 Mutex 0xb68781710(4851, 0) idn 565cf76d oper GET_EXCL(5)
 Library Cache uid 303 efd 9 whr 76 slp 12407
 oper=0 pt1=0xb687815d0 pt2=(nil) pt3=(nil)
 pt4=(nil) pt5=(nil) ub4=0

*** 2019-01-23 18:13:32.504
found process 0xb5124b188 pid=192 serial=42 ospid = 33793 dead
KGX cleanup...
KGX Atomic Operation Log 0x2a4ed9008
 Mutex 0xb68781710(4851, 0) idn 565cf76d oper EXCL(6)
 Library Cache uid 4851 efd 13 whr 98 slp 0
 oper=12 pt1=0xb687815d0 pt2=0x197f140b0 pt3=(nil)
 pt4=(nil) pt5=(nil) ub4=0

LibraryHandle:  Address=0xb687815d0 Hash=565cf76d LockMode=0 PinMode=0 LoadLockMode=0 Status=VALD 
  ObjectName:  Name=SELECT COUNT(*) FROM "NGESHOP"."ES_ORDER_*********" "A1" WHERE "A1"."ORDER_ID"='aaf3ea904147492fbd5be1dd14d096b8' 
    FullHashValue=243baddfd5dc6ffdb59840a3565cf76d Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=1448933229 OwnerIdn=246 

*** 2019-01-23 18:16:02.605
KGX cleanup...
KGX Atomic Operation Log 0xb97f5f4b8
 Mutex 0xb68781710(4851, 0) idn 565cf76d oper GET_EXCL(5)
 Library Cache uid 303 efd 9 whr 76 slp 12330
 oper=0 pt1=0xb687815d0 pt2=(nil) pt3=(nil)
 pt4=(nil) pt5=(nil) ub4=0

*** 2019-01-23 18:16:12.620
found process 0xb5124b188 pid=192 serial=42 ospid = 33793 dead
KGX cleanup...
KGX Atomic Operation Log 0x2a4ed9008
 Mutex 0xb68781710(4851, 0) idn 565cf76d oper EXCL(6)
 Library Cache uid 4851 efd 13 whr 98 slp 0
 oper=12 pt1=0xb687815d0 pt2=0x197f140b0 pt3=(nil)
 pt4=(nil) pt5=(nil) ub4=0

LibraryHandle:  Address=0xb687815d0 Hash=565cf76d LockMode=0 PinMode=0 LoadLockMode=0 Status=VALD 
  ObjectName:  Name=SELECT COUNT(*) FROM "NGESHOP"."ES_ORDER_*********" "A1" WHERE "A1"."ORDER_ID"='aaf3ea904147492fbd5be1dd14d096b8' 
    FullHashValue=243baddfd5dc6ffdb59840a3565cf76d Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=1448933229 OwnerIdn=246 

# hanganalyze trace

-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
 Oracle session identified by:
 {
 instance: 1 (echandb.echandb1)
 os id: 29849
 process id: 2, oracle@kdechan1 (PMON)
 session id: 303
 session serial #: 1
 }
 is waiting for 'library cache: mutex X' with wait info:
 {
 p1: 'idn'=0x565cf76d
 p2: 'value'=0x12f300000000
 p3: 'where'=0x4c
 time in wait: 21.160256 sec
 timeout after: never
 wait id: 19282212
 blocking: 0 sessions
 wait history:
 * time between current wait and wait #1: 0.021533 sec
 1. event: 'pmon timer'
 time waited: 10.009346 sec
 wait id: 19282211 p1: 'duration'=0x3e8
 * time between wait #1 and #2: 0.012146 sec
 2. event: 'library cache: mutex X'
 time waited: 2 min 29 sec
 wait id: 19282210 p1: 'idn'=0x565cf76d
 p2: 'value'=0x12f300000000
 p3: 'where'=0x4c
 * time between wait #2 and #3: 0.021147 sec
 3. event: 'pmon timer'
 time waited: 10.010484 sec
 wait id: 19282209 p1: 'duration'=0x3e8
 }
 and is blocked by
 => Oracle session identified by:
 {
 instance: 1 (echandb.echandb1)
 os id: 33793 (DEAD)
 process id: 192, oracle@kdechan1
 session id: 4851
 session serial #: 22847
 }
 which is not in a wait:
 {
 last wait: 202 min 18 sec ago
 blocking: 4 sessions
 wait history:
 1. event: 'SQL*Net message from client'
 time waited: 0.001799 sec
 wait id: 13552700 p1: 'driver id'=0x54435000
 p2: '#bytes'=0x1
 * time between wait #1 and #2: 0.000010 sec
 2. event: 'SQL*Net message to client'
 time waited: 0.000002 sec
 wait id: 13552699 p1: 'driver id'=0x54435000
 p2: '#bytes'=0x1
 * time between wait #2 and #3: 0.000240 sec
 3. event: 'SQL*Net message from client'
 time waited: 0.002142 sec
 wait id: 13552698 p1: 'driver id'=0x54435000
 p2: '#bytes'=0x1
 }
 
Chain 1 Signature: <not in a wait><='library cache: mutex X'
Chain 1 Signature Hash: 0x6d47f496

# systemstate dump trace

PROCESS 192:
—————————————-
SO: 0xb5124b188, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0xb5124b188, name=process, file=ksu.h LINE:12721, pg=0
(process) Oracle pid:192, ser:42, calls cur/top: 0xadbd83798/0xadbd83798
flags : (0x1) DEAD
flags2: (0x8000), flags3: (0x10)
intr error: -2147483620, call error: 0, sess error: 0, txn error 0
intr queue: 2147483676 2147483676 2147483676
ksudlp FALSE at location: 0
Cleanup details:
Found dead = 244 min 37 sec ago
Total Cleanup attempts = 99, Cleanup time = 235 min 34 sec, Cleanup timer = 235 min 30 sec
Last attempt (full) started 11 sec ago, Length = in progress
(post info) last post received: 0 0 27
last post received-location: ksa2.h LINE:289 ID:ksasnr
last process to post me: 0xb5125cdc0 65 6
last post sent: 0 0 26
last post sent-location: ksa2.h LINE:285 ID:ksasnd
last process posted by me: 0xb5125cdc0 65 6
(latch info) wait_event=0 bits=0x0
Process Group: DEFAULT, pseudo proc: 0xb993e85d0
skgvtime: process 33793 unix pid wrap detected 1496387522 1499482369
O/S info: user: grid, term: UNKNOWN, ospid: 33793 (DEAD)
OSD pid info: Unix process pid: 33793, image: oracle@kdechan1
Short stack dump: ORA-00072: process “Unix process pid: 33793, image: oracle@kdechan1” is not active

PROCESS 2: PMON
—————————————-
SO: 0xb591b8490, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0xb591b8490, name=process, file=ksu.h LINE:12721, pg=0
(process) Oracle pid:2, ser:1, calls cur/top: 0xb59e5a9d8/0xb59e5a9d8
flags : (0xe) SYSTEM
flags2: (0x0), flags3: (0x10)
intr error: 0, call error: 0, sess error: 0, txn error 0
intr queue: empty
ksudlp FALSE at location: 0
(post info) last post received: 0 0 16
last post received-location: ksu.h LINE:14056 ID:ksupsc
last process to post me: 0xb591efaa8 209 0
last post sent: 0 0 38
last post sent-location: ksr2.h LINE:641 ID:ksrchdelete
last process posted by me: 0xb39189cc0 2 6
(latch info) wait_event=0 bits=0x0
Process Group: DEFAULT, pseudo proc: 0xb993e85d0
O/S info: user: oracle, term: UNKNOWN, ospid: 29849
OSD pid info: Unix process pid: 29849, image: oracle@kdechan1 (PMON)
Short stack dump:
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+178
<-ksliwat()+2022<-kslwaitctx()+163<-ksfwaitctx()+14<-kgxWait()+1330<-kgxExclusive()+447<-kglGetMutex()+212<-kglobfr()+1010<-kgllccl()+1657
<-kglMutexRecovery()+661<-kgxCleanup()+548<-kglMutexCleanupAll()+437<-kglMutexCleanup()+13<-kksCleanSessionState()
+232<-ksudlp()+202<-kssxdl()+489<-kssdel()+644<-ksuxdl()+398<-ksuxda_del_proc()+166<-ksucln_dpc_cleanup()+259<-ksucln_dpc_dfs()
+248<-ksucln_dpc_main()+1188<-ksucln_dpc()+29<-ksucln()+1250<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103
<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+230

当前看到是在清理一个分布式事务的dead 会话时失败,根据Pmon的call stack在MOS中检索出BUG。

AWR or ADDM Report Generation Hangs Waiting on ‘library cache:mutex X’ Event ( Doc ID 1960432.1 )

Bug 13542050 : USE OF KGL MUTEXES MIGHT BLOCK ON BOGUS MUTEX HOLDER.

Bug 13542050 – A mutex related hang with holder around 65534 (0xfffe) – superseded ( Doc ID 13542050.8 )
Note that this fix has been superseded by the fix in Bug:24739928

Symptoms:
Hang (Involving Shared Resource)
Mutex Contention
Waits for “cursor: pin X”
Waits for “library cache: mutex X”
Stack is likely to include kgxWait

最终也是重启instance 释放些killed session.

打赏

对不起,这篇文章暂时关闭评论。