首页 » ORACLE, ORACLE [C]系列 » Troubleshooting ORA-6544 [pevm_peruws_callback-1] [4021] waiting to lock object SYS.TAB$

Troubleshooting ORA-6544 [pevm_peruws_callback-1] [4021] waiting to lock object SYS.TAB$

最近遇到的一个案例简单记录一下, 这是一个12C R2的2节点RAC, 节点1 8点左右数据库实例连接hang, 从数据库DB alert log中发现ORA-6544 [pevm_peruws_callback-1] [4021]错误, 可见前期出现大量的library cache 等待,影响了SQL解析和连接,因情况紧急kill 实例重启后恢复正常。

db alert log

2019-08-24T06:56:55.614199+08:00
Thread 1 advanced to log sequence 118903 (LGWR switch)
  Current log# 5 seq# 118903 mem# 0: +DATADG/anbob/ONLINELOG/group_5.282.983030819
  Current log# 5 seq# 118903 mem# 1: +DATADG/anbob/ONLINELOG/group_5.283.983030821
2019-08-24T06:59:33.315946+08:00
Restarting dead background process MMON
Starting background process MMON
2019-08-24T06:59:33.349063+08:00
MMON started with pid=62, OS id=27122 
2019-08-24T07:00:29.250939+08:00
Errors in file /home/u01/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_ora_45429.trc  (incident=311336):
ORA-6544 [pevm_peruws_callback-1] [4021] [] [] [] [] [] [] [] [] [] []
Incident details in: /home/u01/app/oracle/diag/rdbms/anbob/anbob1/incident/incdir_311336/anbob1_ora_45429_i311336.trc
2019-08-24T07:01:05.167351+08:00
Errors in file /home/u01/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_reco_19824.trc:
ORA-00604: error occurred at recursive SQL level 2
ORA-04021: timeout occurred while waiting to lock object SYS.TAB$
2019-08-24T07:03:34.740435+08:00
Thread 1 advanced to log sequence 118904 (LGWR switch)
  Current log# 6 seq# 118904 mem# 0: +DATADG/anbob/ONLINELOG/group_6.284.983030833
  Current log# 6 seq# 118904 mem# 1: +DATADG/anbob/ONLINELOG/group_6.285.983030835
2019-08-24T07:06:27.480995+08:00
Auto-tuning: Shutting down background process GTX1
2019-08-24T07:08:02.474422+08:00
Auto-tuning: Starting background process GTX1
Starting background process GTX1
2019-08-24T07:08:02.511676+08:00
GTX1 started with pid=956, OS id=79464 
2019-08-24T07:08:07.475716+08:00
...
Starting background process GTX5
2019-08-24T07:08:22.538001+08:00
GTX5 started with pid=963, OS id=79983 
2019-08-24T07:10:49.891839+08:00
Thread 1 advanced to log sequence 118905 (LGWR switch)
  Current log# 1 seq# 118905 mem# 0: +DATADG/anbob/ONLINELOG/group_1.272.983031061
  Current log# 1 seq# 118905 mem# 1: +DATADG/anbob/ONLINELOG/group_1.271.983031063
2019-08-24T07:15:29.161080+08:00
Errors in file /home/u01/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_ora_45433.trc  (incident=311352):
ORA-6544 [pevm_peruws_callback-1] [4021] [] [] [] [] [] [] [] [] [] []
Incident details in: /home/u01/app/oracle/diag/rdbms/anbob/anbob1/incident/incdir_311352/anbob1_ora_45433_i311352.trc
2019-08-24T07:17:34.936647+08:00
Thread 1 advanced to log sequence 118906 (LGWR switch)
  Current log# 2 seq# 118906 mem# 0: +DATADG/anbob/ONLINELOG/group_2.288.983031073
  Current log# 2 seq# 118906 mem# 1: +DATADG/anbob/ONLINELOG/group_2.289.983031073
2019-08-24T07:18:22.923117+08:00
Auto-tuning: Shutting down background process GTX5

Restarting dead background process MMON
Starting background process MMON
2019-08-24T08:10:00.387734+08:00
MMON started with pid=1010, OS id=53630 
2019-08-24T08:13:50.945906+08:00
Thread 1 advanced to log sequence 118910 (LGWR switch)
  Current log# 1 seq# 118910 mem# 0: +DATADG/anbob/ONLINELOG/group_1.272.983031061
  Current log# 1 seq# 118910 mem# 1: +DATADG/anbob/ONLINELOG/group_1.271.983031063
2019-08-24T08:15:29.135531+08:00
Errors in file /home/u01/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_ora_45433.trc  (incident=311354):
ORA-6544 [pevm_peruws_callback-1] [4021] [] [] [] [] [] [] [] [] [] []
Incident details in: /home/u01/app/oracle/diag/rdbms/anbob/anbob1/incident/incdir_311354/anbob1_ora_45433_i311354.trc
2019-08-24T08:26:30.126868+08:00
Thread 1 advanced to log sequence 118911 (LGWR switch)
  Current log# 2 seq# 118911 mem# 0: +DATADG/anbob/ONLINELOG/group_2.288.983031073
  Current log# 2 seq# 118911 mem# 1: +DATADG/anbob/ONLINELOG/group_2.289.983031073
2019-08-24T08:30:29.183316+08:00
Errors in file /home/u01/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_ora_45429.trc  (incident=311339):
ORA-6544 [pevm_peruws_callback-1] [4021] [] [] [] [] [] [] [] [] [] []
Incident details in: /home/u01/app/oracle/diag/rdbms/anbob/anbob1/incident/incdir_311339/anbob1_ora_45429_i311339.trc
2019-08-24T08:31:04.085619+08:00
Errors in file /home/u01/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_reco_19824.trc:
ORA-00604: error occurred at recursive SQL level 2
ORA-04021: timeout occurred while waiting to lock object SYS.TAB$

2019-08-24T09:08:25.195240+08:00
Errors in file /home/u01/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_cjq0_24002.trc:
ORA-04021: timeout occurred while waiting to lock object 
2019-08-24T09:08:35.507735+08:00
Auto-tuning: Shutting down background process GTX2
2019-08-24T09:10:20.978922+08:00
Suspending MMON action 'Triton Session Cleanup' for 104400 seconds
2019-08-24T09:10:26.132291+08:00
***********************************************************************
Fatal NI connect error 12170.

2019-08-24T09:12:07.305586+08:00
License high water mark = 669
2019-08-24T09:12:07.306390+08:00
USER (ospid: 69483): terminating the instance
2019-08-24T09:12:10.411258+08:00
Instance terminated by USER, pid = 69483
2019-08-24T09:13:00.117139+08:00
Starting ORACLE instance (normal) (OS id: 81644)
2019-08-24T09:13:00.131558+08:00

Note:
从7:00开始出现ORA-6544 [pevm_peruws_callback-1] [4021], 主要是在等待ora-4021错误。ORA-4021 在等待SYS.TAB$数据内部递归调用的基表. ORA-04021: timeout occurred while waiting to lock object SYS.TAB$,9:00左右数据库实例重启, 但是之间有日志切换(LGWR switch),说明业务期间还是一直有。
重启后数据库没有再报类型错误

/home/u01/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_ora_45429.trc

*** 2019-08-24T08:30:29.184429+08:00
2019-08-24T08:30:29.184401+08:00
Incident 311339 created, dump file: /home/u01/app/oracle/diag/rdbms/anbob/anbob1/incident/incdir_311339/anbob1_ora_45429_i311339.trc
ORA-6544 [pevm_peruws_callback-1] [4021] [] [] [] [] [] [] [] [] [] []
*kjudmpkblkr: final blocker not found

*** 2019-08-24T09:00:29.138206+08:00
ERROR: KGL Lock timeout. Handle=0x32fdb1e70

LibraryObjectLock: Address=0x31bb39138 Handle=0x32fdb1e70 RequestMode=S
CanBeBrokenCount=24 Incarnation=6 ExecutionCount=0

User=0x121bc39d0 Session=0x1019cfc70 ReferenceCount=0
Flags=[0000] SavepointNum=7fc
LibraryHandle: Address=0x32fdb1e70 Hash=1400a489 LockMode=0 PinMode=0 LoadLockMode=0 Status=VALD
ObjectName: Name=SYS.TAB$
FullHashValue=dba1f47a6c1943c58ece900e1400a489 Namespace=TABLE/PROCEDURE(01) Type=TABLE(02) ContainerId=0 ContainerUid=0 Identifier=4 OwnerIdn=0
Statistics: InvalidationCount=0 ExecutionCount=0 LoadCount=102 ActiveLocks=0 TotalLockCount=68585 TotalPinCount=68567
Counters: BrokenCount=24 RevocablePointer=6 KeepDependency=0 Version=0 BucketInUse=1951 HandleInUse=1951 HandleReferenceCount=0
Concurrency: DependencyMutex=0x32fdb1f20(0, 3043, 0, 0) Mutex=0x32fdb1fc0(0, 898114, 4, 0)
Flags=PKP/PIN/TIM/KEP/BSO/[00902803] Flags2=[0000]
WaitersLists:
Lock=0x32fdb1f00[0x31bb391b8,0x331f300f0]
Pin=0x32fdb1ee0[0x32fdb1ee0,0x32fdb1ee0]
LoadLock=0x32fdb1f58[0x32fdb1f58,0x32fdb1f58]
Timestamp: Current=08-01-2018 15:10:46
HandleReference: Address=0x32fdb2058 Handle=0x317ec9c58 Flags=OWN[200]
LockInstance: id='LBdba1f47a6c1943c5' GlobalEnqueue=(nil) ReleaseCount=0
PinInstance: id='NBdba1f47a6c1943c5' GlobalEnqueue=(nil)
LibraryObject: Address=0x32fdb0d20 HeapMask=0101-0141-0749-0000 Flags=EXS/LOC[0004] Flags2=/RLD[8000002] PublicFlags=[0000]
sid: 2591 ser: 39097 audsid: 38275344 user: 0/SYS
flags: (0x41) USR/- flags2: (0x9) -/-/INC
flags_idl: (0x1) status: BSY/-/-/- kill: -/-/-/-
pid: 914 O/S info: user: grid, term: UNKNOWN, ospid: 45429
image: oracle@weejardb1
client details:
O/S info: user: tomas, term: , ospid: 1234
machine: ZHJK_SERVER4 program: oracle@weejardb1
application name: oracle@weejardb1, hash value=3583890763
current SQL:
BEGIN PRO_DATA_XXX(:1,:2,:3,:4,:5); END;
Short stack dump:
ksedsts()+346<-ksdxfstk()+71<-ksdxdocmdmultex()+5870<-ksdxdocmdmult()+55<-ksudumpses()+3457<-kqlpgCallback()+3198<-kglDumpTimeoutDiag()+254<-kgllkal()+4130<-kglLock()+1307<-kglget()+290<-kksaxs()+1141<-kksauc()+431<-kkscscid_auc_eval()+318<-kkscsCheckCriteria()+1839<-kkscsCheckCursor()+641<-kkscsSearchChildList()+1985<-kksfbc()+12102<-kkspsc0()+2130<-kksParseCursor()+123<-opiosq0()+2391<-opiall0()+4567<-opikpr()+559<-opiodr()+1229<-rpidrus()+201<-skgmstack()+65<-rpidru()+134<-rpiswu2()+627<-kprball()+1111<-kqdGetBundledCursor()+468<-kqldtar()+435<-kqlobjlod()+1619<-kqllod_new()+420<-kqlCallback()+70<-kqllod()+291<-kglobld()+1080<-kglobpn()+2375<-kglpim()+425<-kglpin()+1672<-kglgob()+545<-qcdlgbo()+591<-qcdlgob()+1005<-qcsfgob()+290<-qcsprfro()+531<-qcsprfro_tree()+380<-qcsprfro_tree()+150<-qcspafq()+246<-qcspqbDescendents()+278<-qcspqb()+272<-kkmdrv()+192<-opiSem()+1978<-opiDeferredSem()+447<-opitca()+436<-kksFullTypeCheck()+86<-rpiswu2()+627<-kksSetBindType()+1801<-kksfbc()+11080<-opiexe()+2926<-kpoal8()+2683<-opiodr()+1229<-kpoodrc()+42<-rpiswu2()+627<-kpoodr()+656<-upirtrc()+2390<-kpurcsc()+102<-kpuexec()+11361<-OCIStmtExecute()+41<-kqlUserPrivHandleLoad()+1277<-kglobld()+1080<-kglobpn()+2375<-kglpim()+425<-kglpin()+1672<-kzpGetUserPrivHandle()+702<-kzpChkIHPUserPriv()+401<-kzpLookupInheritPrivileges()+82<-kxep_check_inh_privs_()+165<-pevm_ENTER()+1219<-pfrinstr_ENTER()+64<-pfrrun_no_tool()+60<-pfrrun()+919<-plsql_run()+756<-peicnt()+288<-kkxexe()+717<-opiexe()+22561<-kpoal8()+2683<-opiodr()+1229<-ttcpip()+1257<-opitsk()+1940<-opiino()+941<-opiodr()+1229<-opidrv()+1021<-sou2o()+145<-opimai_real()+455<-ssthrdmain()+417<-main()+262<-__libc_start_main()+245
2019-08-24T09:00:29.233370+08:00
Incident 311340 created, dump file: /home/u01/app/oracle/diag/rdbms/anbob/anbob1/incident/incdir_311340/anbob1_ora_45429_i311340.trc
ORA-6544 [pevm_peruws_callback-1] [4021] [] [] [] [] [] [] [] [] [] []

anbob1_ora_45429_i311339.trc

----- Current SQL Statement for this session (sql_id=5fjyqfc1u5s07) -----
BEGIN PRO_DATA_XXX(:1,:2,:3,:4,:5); END;
[TOC00004]
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
object line object
handle number name
0x32ed612c8 1 procedure PNIUSER.PRO_DATA_xxxx
0x33690ac10 1 anonymous block
[TOC00004-END]
[TOC00003-END]
[TOC00005]
----- Call Stack Trace -----
calling call entry
location type point
-------------------- -------- --------------------
ksedst()+119 call kgdsdst()
dbkedDefDump()+1200 call ksedst()
ksedmp()+259 call dbkedDefDump()
dbgexPhaseII()+2130 call ksedmp()
dbgexProcessError() call dbgexPhaseII()
dbgePostErrorDirectVaList_int()+2073 call dbgexProcessError()
dbgePostErrorDirect()+796 call dbgePostErrorDirectVaList_int()
pevm_peruws_callback()+1115 call dbgePostErrorDirect()
kgepop()+344 call pevm_peruws_callback()
kgersel()+265 call kgepop()
kglpin()+4645 call kgersel()
kzpGetUserPrivHandle()+702 call kglpin()
kzpChkIHPUserPriv()+401 call kzpGetUserPrivHandle()
kzpLookupInheritPrivileges()+82 call kzpChkIHPUserPriv()
kxep_check_inh_privs_()+165 call kzpLookupInheritPrivileges()
pevm_ENTER()+1219 call kxep_check_inh_privs_()
pfrinstr_ENTER()+64 call pevm_ENTER()
pfrrun_no_tool()+60 call pfrinstr_ENTER()
pfrrun()+919 call pfrrun_no_tool()
plsql_run()+756 call pfrrun()
peicnt()+288 call plsql_run()
kkxexe()+717 call peicnt()
opiexe()+22561 call kkxexe()
kpoal8()+2683 call opiexe()
opiodr()+1229 call kpoal8()
ttcpip()+1257 call opiodr()
opitsk()+1940 call ttcpip()
opiino()+941 call opitsk()
opiodr()+1229 call opiino()
opidrv()+1021 call opiodr()
sou2o()+145 call opidrv()
opimai_real()+455 call sou2o()
ssthrdmain()+417 call opimai_real()
main()+262 call ssthrdmain()
__libc_start_main() call main()
_start()+41 call __libc_start_main()
[TOC00005-END]

dia0 trace file

*** 2019-08-24T08:43:06.123027+08:00
HM: Session with ID 1450 serial # 58357 (FG) on read/write instance 1 is hung
and is waiting on 'library cache load lock' for 96 seconds.
Session was previously waiting on 'SQL*Net message from client'.
Final Blocker is Session ID 3503 serial# 48131 on instance 1
which is waiting on 'library cache lock' for 101 seconds
p1: 'handle address'=0x32fdb1e70, p2: 'lock address'=0x331863638, p3: '100*mode+namespace'=0x400010002

2019-08-24 08:43:06.130 :kjzgldatachk(): In Progress sending to dest:1
kjznhmhub: memcpy check failed with rcm = -3

*** 2019-08-24T08:43:18.474014+08:00
HM: Session with ID 1832 serial # 64277 (FG) on read/write instance 1 is hung
and is waiting on 'library cache lock' for 96 seconds.
Session was previously waiting on 'library cache load lock'.
Session ID 1832 is blocking 1 session

2019-08-24 08:43:18.479 :kjzgldatachk(): In Progress sending to dest:1
kjznhmhub: memcpy check failed with rcm = -3

*** 2019-08-24T08:43:21.627341+08:00
HM: Short Stack of immediate waiter session ID 1297, OSPID 79340 of hang ID 169
Short stack dump:
ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-__poll()+16<-ipcgxp_selectex()+409<
-ipclw_wait()+1045<-ksxpwait_ipclw()+3844<-ksxpwait_int()+22103<-ksxpwait()+845<-ksliwat()+10782<-kslwaitctx()
+197<-kglllal()+1253<-kglobld()+381<-kglobpn()+2375<-kglpim()+425<-kglpin()+1672<-kglgob()+545<-qcdlgbo()+591<
-qcdlgob()+1005<-qcsfgob()+290<-qcsprfro()+531<-qcsprfro_tree()+380<-qcsprfro_tree()+150<-qcspafq()+246<
-qcspqbDescendents()+278<-qcspqb()+272<-kkmdrv()+192<-opiSem()+1978<-opiDeferredSem()+447<-opitca()+436<
-kksFullTypeCheck()+86<-rpiswu2()+627<-kksLoadChild()+8003<-kxsGetRuntimeLock()+2035<-kksfbc()+15083<-kkspsc0()
+2130<-kksParseCursor()+123<-opiosq0()+2391<-kpooprx()+404<-kpoal8()+850<-opiodr()+1229<-ttcpip()+1257<-opitsk()
+1940<-opiino()+941<-opiodr()+1229<-opidrv()+1021<-sou2o()+145<-opimai_real()+455<-ssthrdmain()+417<-main()+262
<-__libc_start_main()+245

HM: current SQL: select NE_KEY from FACTMSCSER15 where ne_key in

*** 2019-08-24T08:46:36.472507+08:00
HM: Short Stack of root session ID 5177, OSPID 75577 of hang ID 177
Short stack dump: 
ksedsts()+346<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+217<-__sighandler()<-__poll()+16<-ipcgxp_selectex()+409
<-ipclw_wait()+1045<-ksxpwait_ipclw()+3844<-ksxpwait_int()+22103<-ksxpwait()+845<-ksliwat()+10782<-kslwaitctx()
+197<-kjusuc()+9058<-ksipgetctxia()+5359<-ksixpgetia()+167<-kqlmLock()+3201<-kqlmClusterLock()+209<-kgllkal()
+3424<-kglLock()+1307<-kglget()+290<-kksaxs()+1141<-kksauc()+431<-kkscscid_auc_eval()+318<-kkscsCheckCriteria()+
1839<-kkscsCheckCursor()+641<-kkscsSearchChildList()+1985<-kksfbc()+12102<-kkspsc0()+2130<-kksParseCursor()+123<
-opiosq0()+2391<-opiall0()+4567<-opikpr()+559<-opiodr()+1229<-rpidrus()+201<-skgmstack()+65<-rpidru()+134<-
rpiswu2()+627<-kprball()+1111<-kqdGetBundledCursor()+468<-kqldtar()+435<-kqlobjlod()+1619<-kqllod_new()+420<-
kqlCallback()+70<-kqllod()+291<-kglobld()+1080<-kglobpn()+2375<-kglpim()+425<-kglpin()+1672<-kglgob()+545<-
qcdlgbo()+591<-qcdlgob()+1005<-qcsfgob()+290<-qcsprfro()+531<-qcsprfro_tree()+380<-qcsprfro_tree()+150<-qcspafq
()+246<-qcspqbDescendents()+278<-qcspqb()+272<-kkmdrv()+192<-opiSem()+1978<-opiDeferredSem()+447<-opitca()+436<
-kksFullTypeCheck()+86<-rpiswu2()+627<-kksLoadChild()+8003<-kxsGetRuntimeLock()+2035<-kksfbc()+15083<-kkspsc0()+
2130<-kksParseCursor()+123<-opiosq0()+2391<-kpooprx()+404<-kpoal8()+850<-opiodr()+1229<-ttcpip()+1257<-opitsk()+
1940<-opiino()+941<-opiodr()+1229<-opidrv()+1021<-sou2o()+145<-opimai_real()+455<-ssthrdmain()+417<-main()+262<-
__libc_start_main()+245
 
HM: current SQL: select NE_KEY from Roma_Region_I

ORA-4021
Oracle 把表,view, procedure, function, index等对象的定义cache在内存中, 这个内存叫做Library cache,属于Shared Pool中的一部分。当对象正在用时,其它会话不允许修改,受到library cache lock 或 pin 的保护。
当一会话需要使用一个对象是,需要先请求该对象的Library LOCK,当有其它会话占用时,会一直等待其释放, 如果在一定时间还没有得到lock,达到KGL的等待阀值后,会停止,并抛出错误,KGL 等待的上限就是15分钟,有隐藏参数_kgl_time_to_wait_for_locks控制.

分析思路

通常libaray cache 出现问题可能ASM, AWR都无法收集数据,问题时如果时间允许做个SSD trace dump.

1, 做hanganalyze

sqlplus -prelim / as sysdba
oradebug setmypid;
oradebug unlimit;
oradebug -g all hanganalyze 3
等10秒
oradebug -g all hanganalyze 3

2, 查询等待

select /*+ ordered */ w1.sid  waiting_session,
         h1.sid  holding_session,
         w.kgllktype lock_or_pin,
         w.kgllkhdl address,
         decode(h.kgllkmod,  0, 'None', 1, 'Null', 2, 'Share', 3, 'Exclusive',
            'Unknown') mode_held,
         decode(w.kgllkreq,  0, 'None', 1, 'Null', 2, 'Share', 3, 'Exclusive',
          'Unknown') mode_requested
   from dba_kgllock w, dba_kgllock h, v$session w1, v$session h1
  where
   (((h.kgllkmod != 0) and (h.kgllkmod != 1)
      and ((h.kgllkreq = 0) or (h.kgllkreq = 1)))
    and
      (((w.kgllkmod = 0) or (w.kgllkmod= 1))
      and ((w.kgllkreq != 0) and (w.kgllkreq != 1))))
   and  w.kgllktype      =  h.kgllktype
   and  w.kgllkhdl =  h.kgllkhdl
   and  w.kgllkuse     =   w1.saddr
   and  h.kgllkuse     =   h1.saddr
 /

3, 检查主机CPU,内存资源

4, 检查当时的job, gather table stats, DDL

5, 检查当时是否有DATAPUMP (当前案例从alert 没有发现datapump job)

6, MOS 中有个相似Bug 26732119 – ORA-04021: TIMEOUT OCCURRED WHILE WAITING TO LOCK OBJECT SYS.TAB$ , 但最终因证据不足没有认定为BUG

7, 12c R2 RAC环境在library cache SHARED lock优化新特性

The short stack contains the RAC specific functions: “kqlmLock kqlmClusterLock” which points to an issue introduced by RAC S-lock optimization.
In 12.2 RAC the Library Cache takes advantage of SHARE lock optimization. S-lock optimization is enabled by default by the setting of _lm_share_lock_restype. By default, the parameter is ‘L*N*Q*’

建议禁用该特性观察
ALTER SYSTEM SET “_lm_share_lock_opt” = ‘FALSE’ SCOPE=SPFILE SID=’*’;

打赏

,

对不起,这篇文章暂时关闭评论。