首页 » ORACLE 9i-23c » Troubleshooting Active Dataguard Hangs waiting for library cache lock on DBINSTANCE namespace

Troubleshooting Active Dataguard Hangs waiting for library cache lock on DBINSTANCE namespace

Oracle 11.2.0.3 2nodes RAC on hpux环境, standby dataguard端突然产生较大GAP,日志停止应用,大量前台查询进程等待”library cache load lock”, hanganalyze trace显示’rdbms ipc message'<=’library cache lock’ 等待链, library cache lock是一个parse lock,当在ADG环境遇到library cache lock时通常发现在以下情况:primary site上做了DDL; 系统权限做了revoke. 而对于这个问题我们尝试重启日志应用进程都没有解决,无奈当时重启standby实例恢复,后分析确认在ADG环境中存在一个相关bug, 安装patch后就可以彻底解决,当然如果是primary 就是频繁的DDL,而导致standby只是争用这个patch可能无用。这里记录一下这个问题。

这类问题通常建议收集:

hanganalyze, systemstate dump, v$ash;

— hanganalye

Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (stdanbob.anbob1)
                   os id: 22931
              process id: 135, oracle@weejar1
              session id: 2479
        session serial #: 1571
    }
    is waiting for 'library cache lock' with wait info:
    {
                      p1: 'handle address'=0xc0000015bff7f678
                      p2: 'lock address'=0xc000001334cffad8
                      p3: '100*mode+namespace'=0x1004a0002
            time in wait: 21.353805 sec
           timeout after: 14 min 38 sec
                 wait id: 2
                blocking: 0 sessions
            wait history:
              * time between current wait and wait #1: 0.143108 sec
              1.       event: 'SQL*Net message from client'
                 time waited: 0.001206 sec
                     wait id: 1               p1: 'driver id'=0x54435000
                                              p2: '#bytes'=0x1
              * time between wait #1 and #2: 0.000013 sec
              2.       event: 'SQL*Net message to client'
                 time waited: 0.000001 sec
                     wait id: 0               p1: 'driver id'=0x54435000
                                              p2: '#bytes'=0x1
    }
    and is blocked by
 => Oracle session identified by:
    {
                instance: 1 (stdanbob.anbob1)
                   os id: 12817
              process id: 32, oracle@weejar1 (LGWR)
              session id: 11265
        session serial #: 1
    }
    which is waiting for 'rdbms ipc message' with wait info:
    {
                      p1: 'timeout'=0x12c
            time in wait: 2.488548 sec
      heur. time in wait: 8.508734 sec
           timeout after: 0.511452 sec
                 wait id: 586406071
                blocking: 7 sessions
            wait history:
              * time between current wait and wait #1: 0.000089 sec
              1.       event: 'rdbms ipc message'
                 time waited: 3.010034 sec
                     wait id: 586406070       p1: 'timeout'=0x12c
              * time between wait #1 and #2: 0.000065 sec
              2.       event: 'rdbms ipc message'
                 time waited: 3.009998 sec
                     wait id: 586406069       p1: 'timeout'=0x12c
              * time between wait #2 and #3: 0.000025 sec
              3.       event: 'rdbms ipc message'
                 time waited: 0.687558 sec
                     wait id: 586406068       p1: 'timeout'=0x44
    }

Chain 1 Signature: 'rdbms ipc message'<='library cache lock'

# SSD trace

Blockers
~~~~~~~~

        Above is a list of all the processes. If they are waiting for a resource
        then it will be given in square brackets. Below is a summary of the
        waited upon resources, together with the holder of that resource.
        Notes:
        ~~~~~
         o A process id of '???' implies that the holder was not found in the
           systemstate. (The holder may have released the resource before we
           dumped the state object tree of the blocking process).
         o Lines with 'Enqueue conversion' below can be ignored *unless* 
           other sessions are waiting on that resource too. For more, see 
           http://gbr30026.uk.oracle.com:81/Public/TOOLS/Ass.html#enqcnv)
         o You might see a process blocked on a mutex wait event but also 
           reported as holding the same mutex. You will need to check the 
           processstate dump as we might have been waiting at the start of the 
           process dump but have acquired it before the dump completed.

                    Resource Holder State
                 Mutex 13e4d    10: Blocker
                 Mutex 13e4d    39: Blocker
LOCK: Handle=c0000015bff7f678    32: waiting for 'rdbms ipc message'
      LOAD: c0000013bcdaf710    ??? Blocker
      LOAD: c0000013bcdab630   361: 361: is waiting for 32:
      LOAD: c0000013bcd70ac8   667: 667: is waiting for 32:
      LOAD: c0000013e2cfa070   918: 918: is waiting for 32:
      LOAD: c0000013beff2170   943: 943: is waiting for 32:
      LOAD: c0000013e2d60cd8   954: 954: is waiting for 32:
      LOAD: c0000013e2d60ef8   950: 950: is waiting for 32:
      LOAD: c0000013e2cfa290   704: 704: is waiting for 32:
      LOAD: c0000013e2e5e690   910: 910: is waiting for 32:
      LOAD: c0000013bde32768   650: 650: is waiting for 32:
      LOAD: c0000012fb399e90   988: 988: is waiting for 32:
      LOAD: c0000012fb6ef678    ??? Blocker
      LOAD: c0000013bcdf7880  1008: 1008: is waiting for 32:
      LOAD: c0000013bcdf7660  1014: 1014: is waiting for 32:

PID to SID Mapping
~~~~~~~~~~~~~~~~~~
Pid 10 maps to Sid(s): 3521
Pid 39 maps to Sid(s): 13729 13730
Pid 32 maps to Sid(s): 11265
Pid 361 maps to Sid(s): 14434
Pid 667 maps to Sid(s): 9522 9525
Pid 918 maps to Sid(s): 7770 7746
Pid 943 maps to Sid(s): 16554
Pid 954 maps to Sid(s): 20422
Pid 950 maps to Sid(s): 19019
Pid 704 maps to Sid(s): 27 1
Pid 910 maps to Sid(s): 4934 4953
Pid 650 maps to Sid(s): 3531 3532
Pid 988 maps to Sid(s): 9884
Pid 1008 maps to Sid(s): 16909
Pid 1014 maps to Sid(s): 19037

Warning: The following processes have multiple session state objects and
may not be properly represented above :
    42:   52:  273:                                                            

Object Names
~~~~~~~~~~~~
Mutex 13e4d                                                   
LOCK: Handle=c0000015bff7f678   CURSOR(00):SYS.anbob          
LOAD: c0000013bcdaf710                                        
LOAD: c0000013bcdab630                                        
LOAD: c0000013bcd70ac8                                        
LOAD: c0000013e2cfa070                                        
LOAD: c0000013beff2170                                        
LOAD: c0000013e2d60cd8                                        
LOAD: c0000013e2d60ef8                                        
LOAD: c0000013e2cfa290                                        
LOAD: c0000013e2e5e690                                        
LOAD: c0000013bde32768                                        
LOAD: c0000012fb399e90                                        
LOAD: c0000012fb6ef678                                        
LOAD: c0000013bcdf7880                                        
LOAD: c0000013bcdf7660                                        

Summary of Wait Events Seen (count>10)
~~~~~~~~~~~~~~~~~~~~~~~~~~~
  No wait events seen more than 10 times

  ----------------------------------------
  SO: 0xc0000016536b1cc8, type: 2, owner: 0x0000000000000000, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0xc0000016536b1cc8, name=process, file=ksu.h LINE:12616 ID:, pg=0
  (process) Oracle pid:32, ser:1, calls cur/top: 0xc0000016164719f8/0xc0000016164719f8
            flags : (0x6) SYSTEM
            flags2: (0x0),  flags3: (0x10)
            intr error: 0, call error: 0, sess error: 0, txn error 0
            intr queue: empty
    ksudlp FALSE at location: 0
  (post info) last post received: 0 0 35
              last post received-location: ksr2.h LINE:627 ID:ksrpublish
              last process to post me: c0000016236326c0 1 6
              last post sent: 0 0 36
              last post sent-location: ksr2.h LINE:631 ID:ksrmdone
              last process posted by me: c0000016236326c0 1 6
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: 0xc000001653f68b68
    O/S info: user: oracle, term: UNKNOWN, ospid: 12817
    OSD pid info: Unix process pid: 12817, image: oracle@kdyya1 (LGWR)
    Short stack dump:
ksedsts()+544<-ksdxfstk()+48<-ksdxcb()+3216<-sspuser()+688<-<-_pw_wait()+48<-pw_wait()+112<-sskgpwwait()+432<-skgpwwait()+320<-ksliwat()+3328<-kslwaitctx()+304<-kslwait()+192<-ksarcv()+640<-ks
babs()+752<-ksbrdp()+2736<-opirip()+1296<-opidrv()+1152<-sou2o()+256<-opimai_real()+352<-ssthrdmain()+576<-main()+336<-main_opd_entry()+80
    ----------------------------------------
	
    ----------------------------------------
    SO: 0xc0000016552ca578, type: 4, owner: 0xc0000016536b1cc8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0xc0000016536b1cc8, name=session, file=ksu.h LINE:12624 ID:, pg=0
    (session) sid: 11265 ser: 1 trans: 0x0000000000000000, creator: 0xc0000016536b1cc8
              flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
              flags2: (0x409) -/-/INC
              DID: , short-term DID:
              txn branch: 0x0000000000000000
              oct: 0, prv: 0, sql: 0x0000000000000000, psql: 0x0000000000000000, user: 0/SYS
    ksuxds FALSE at location: 0
    service name: SYS$BACKGROUND
    Current Wait Stack:
     0: waiting for 'rdbms ipc message'
        timeout=0xe7, =0x0, =0x0
        wait_id=586406182 seq_num=64592 snap_id=1
        wait times: snap=0.027546 sec, exc=0.027546 sec, total=0.027546 sec
        wait times: max=2.310000 sec, heur=0.027546 sec
        wait counts: calls=1 os=1
        in_wait=1 iflags=0x5a8
    There are 1 sessions blocked by this session.
    Dumping one waiter:
      inst: 1, sid: 20067, ser: 55
      wait event: 'library cache lock'
        p1: 'handle address'=0xc0000015bff7f678
        p2: 'lock address'=0xc0000013dad14ef8
        p3: '100*mode+namespace'=0x1004a0002
      row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0
      min_blocked_time: 12 secs, waiter_cache_ver: 16092
    Wait State:
      fixed_waits=0 flags=0x22 boundary=0x0000000000000000/-1
    Session Wait History:

  
    ----------------------------------------
    SO: 0xc0000016164719f8, type: 3, owner: 0xc0000016536b1cc8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0xc0000016536b1cc8, name=call, file=ksu.h LINE:12620 ID:, pg=0
    (call) sess: cur c0000016552ca578, rec 0, usr c0000016552ca578; flg:20 fl2:1; depth:0
    svpt(xcb:0x0000000000000000 sptn:0x2 uba: 0x00000000.0000.00)
    ksudlc FALSE at location: 0
      ----------------------------------------
      SO: 0xc0000015cff543b8, type: 78, owner: 0xc0000016164719f8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
       proc=0xc0000016536b1cc8, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:8547 ID:, pg=0

      LibraryObjectLock:  Address=c0000015cff543b8 Handle=c0000015bff7f678 Mode=X CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0

        User=c0000016552ca578 Session=c0000016552ca578 ReferenceCount=1 Flags=CNB/[0001] SavepointNum=2
      LibraryHandle:  Address=c0000015bff7f678 Hash=d68d5950 LockMode=X PinMode=0 LoadLockMode=0 Status=0
        ObjectName:  Name=SYS.anbob
          FullHashValue=93b41a89a2321a9b51ff1c32d68d5950 Namespace=DBINSTANCE(74) Type=CURSOR(00) Identifier=1 OwnerIdn=0
        Statistics:  InvalidationCount=0 ExecutionCount=0 LoadCount=0 ActiveLocks=1 TotalLockCount=3424094 TotalPinCount=0
        Counters:  BrokenCount=1 RevocablePointer=1 KeepDependency=0 BucketInUse=7020 HandleInUse=7020 HandleReferenceCount=0
        Concurrency:  DependencyMutex=c0000015bff7f728(0, 0, 0, 0) Mutex=c0000015bff7f7a8(0, 7048705, 1949, 0)
        Flags=RON/PIN/KEP/BSO/[00810003]
        WaitersLists:
          Lock=c0000015bff7f708[c0000013dad14f68,c0000013dad14f68]
          Pin=c0000015bff7f6e8[c0000015bff7f6e8,c0000015bff7f6e8]
          LoadLock=c0000015bff7f760[c0000015bff7f760,c0000015bff7f760]
        Timestamp:
        HandleReference:  Address=c0000015bff7f818 Handle=0000000000000000 Flags=[00]

Note:
在当前实例Lgwr 持有DBINSTANCE namespace library cache lock并且是 X mode。LGWR本身是空闲状态event为rdbms ipc message
,等信号调用, 但是它会堵塞其它会话,在”library cache lock” for the DBINSTANCE namespace lock。

在11.2.0.3版本仅当 bug 11664426 的修复到位时才会发生这种情况,更多参考MOS  Bug 17292239  Active Dataguard Hangs waiting for library cache lock on DBINSTANCE. Tracking bug to deliver backports to remove fix for 11664426。

触发场景通常是, 当Primary数据库上存在大量 DDL 时,Standby Active Dataguard 数据库可能会遇到严重的library cache lock争用,并可能导致死锁,从而导致数据库挂起。

因为11.2.0.3 中安装bug 11664426.才引起了该问题,所在Oracle紧急出了Bug 17292239补丁删除了11664426的修复,出了新的Bug 16717701 ,如果修复了错误 16717701,ADG 解析锁获取超时可能会失败,并出现 ORA-600 [krdrsb_end_qscn_2] 或 ORA-4021存在并且实例可能会崩溃。引出新的问题Bug:17018214 ,所以Bug 16717701修复再次叫停,被Bug:17018214取代。

修复不易呀 11664426》17292239》16717701》17018214

从SSD中可以看到LGWR holding “DBINSTANCE” namespace library cache lock in X-mode, MOS给出Cancel the media recovery and restart临时解决方法在我们这个例子是不可行的,但是有测试cancel后,flush 所有实例的shared pool再次应用日志有时可以解决。 彻底的解决方法是在standby db安装patch 17018214.

注意安装16717701 patch后所有版本默认fix是Disable的,需要使用event 16717701使修复生效,level 是一个参数值,在readme中也有介绍该值的计算方法, 如果在没有安装oneoff patch 启用event 16717701不报错,但是同样不会有任何作用。

For backports/one-off patches, the fix must be enabled by setting event 16717701 at level 104887600:

alter system set EVENT='16717701 trace name context forever, level 104887600' scope=spfile;

The 104887600 value of the event encodes two things:
1. Timeout – this is the amount of time that LGWR will wait for an X lock before signaling a timeout error and retrying.
2. Sleep duration – this is the amount of time that LGWR will sleep for after having timed out.

Both of the above values are denoted in MILLISECONDS. The event value encodes the sleep duration in its 12high-order bits and the timeout in the 20 lower order bits. The value can be calculated using the following formula:

value = (S * 1048576) + T

where S = sleep duration in milliseconds
T = timeout in milliseconds

当然这个值可以进行微调,根据oracle的建议是开始配置30 second timeout 和 100 ms sleep duration,也就是说LGWR会等待30秒获取X-lock, 如时失败会在sleeping 100秒秒后重试,使用上面的计算方法,event的level值计算就是:

value = (100 * 1048576) + 30000 = 104887600

在 RAC ADG 环境中这个event必须在所有使用使用相同的值,而且不支持滚动修改。

目前来看安装16717701再配置event启用已经没有必要,安装17018214即可。这个bug在11.2.0.4 和 12c已修复,同时引入了隐藏参数控制 “_adg_parselock_timeout“和”_adg_parselock_timeout_sleep “.

SQL> @pd parselock
Show all parameters and session values from x$ksppi/x$ksppcv...

       NUM N_HEX NAME                               VALUE        DESCRIPTION
---------- ----- ---------------------------------- ------------ ------------------------------------------------------------------
      2345   929 _adg_parselock_timeout             0            timeout for parselock get on ADG in centiseconds
      2347   92B _adg_parselock_timeout_sleep       100          sleep duration after a parselock timeout on ADG in milliseconds

The value of 550 centiseconds has been typically enough (550*0.01=5.5 seconds): “_adg_parselock_timeout”=550;
it can be adjusted to the recommended value of 30 seconds; “_adg_parselock_timeout”= 3000.

There should be a statistic “ADG parselock X get attempts” If it gets set too small that value would likely increase a lot due to keep timing out and retrying.

 

— 2024-1-17 update —

注意上面的案列前台进程被LGWR 堵塞,LGWR持有dbinstance X mode的library cache lock,  还有一种情况是LGWR请求library cache lock 被前台进程会话堵塞,现象就是LGWR 挂起DG的scn无法增加,产生同步延迟。 可以做hanganalyze和systemstate dump。

PROCESS 59: LGWR
----------------------------------------
SO: 0x1200008528, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x1200008528, name=process, file=ksu.h LINE:12721, pg=1
(process) Oracle pid:59, ser:1, calls cur/top: 0x11d36ee730/0x11d36ee730
OSD pid info: Unix process pid: 293835, image: xxxxxxxxxxxxx (LGWR)

SO: 0x1200548dd8, type: 4, owner: 0x1200008528, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x1200008528, name=session, file=ksu.h LINE:12729, pg=0
(session) sid: 827 ser: 1 trans: (nil), creator: 0x1200008528
service name: SYS$BACKGROUND
Current Wait Stack:
0: waiting for 'library cache lock'
handle address=0x18bfd0cb80, lock address=0x191ff3f030, 100*mode+namespace=0x1004a0003
wait_id=991 seq_num=1014 snap_id=1
wait times: snap=1.829312 sec, exc=1.829312 sec, total=1.829312 sec
wait times: max=5.500000 sec, heur=1.829312 sec
wait counts: calls=2 os=2
in_wait=1 iflags=0x15a2
There is at least one session blocking this session.
Dumping 1 direct blocker(s):
inst: 1, sid: 996, ser: 5

LibraryObjectLock: Address=0x191ff3f030 Handle=0x18bfd0cb80 RequestMode=X CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0

User=0x1200548dd8 Session=0x1200548dd8 ReferenceCount=0 Flags=[0000] SavepointNum=2
LibraryHandle: Address=0x18bfd0cb80 Hash=548dac5f LockMode=S PinMode=0 LoadLockMode=0 Status=0
ObjectName: Name=SYS.xxxxxxx
FullHashValue=2499dcf083f8e14276cd0eaf548dac5f Namespace=DBINSTANCE(74) Type=CURSOR(00) Identifier=1 OwnerIdn=0

PROCESS 107:
----------------------------------------
SO: 0x120000e978, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x120000e978, name=process, file=ksu.h LINE:12721, pg=1
(process) Oracle pid:107, ser:3, calls cur/top: 0x11d36f2680/0x11d36f20c0
OSD pid info: Unix process pid: 318247, image: oracle@xxxxxxxxxxxxxxx (TNS V1-V3)
Short stack dump:
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-kgh_heap_sizes()+267<-kkoDumpHeapSize()+92<-kkoqbc()+32778<-apakkoqb()+171<-apaqbdDescendents()+452<-apaqbdList()+76<-apaqbdDescendents()+772<-apaqbdList()+76<-apaqbdDescendents()+267<-apaqbdList()+76<-apaqbdDescendents()+772<-apaqbdList()+76<-apaqbdDescendents()+267<-apaqbdList()+76<-apaqbdDescendents()+772<-apaqbdList()+76<-apaqbdDescendents()+267<-apaqbdList()+76<-apaqbdDescendents()+772<-apaqbdList()+76<-apaqbdDescendents()+267<-apaqbdList()+76<-apaqbdDescendents()+267<-apaqbdList()+76<-apaqbd()+14<-kkoggvc()+628<-kkogvcd()+582<-kkqljppuvs()+2612<-kkqljppur()+1766<-kkqfppDrvDescendents()+1123<-kkqfppDrv()+184<-kkqfppPsh()+480<-kkqfppPsh()+5020<-kkqfppDrv1()+384<-kkqfppDrvDescendents()+1206<-kkqfppDrv()+184<-kkqfppPsh()+480<-kkqfppPsh()+5020<-kkqfppDrv1()+384<-kkqfppDrvDescendents()+1206<-kkqfppDrv()+184<-kkqfppPsh()+480<-kkqfppPsh()+5020<-kkqfppDrv1()+384<-kkqfppDrvDescendents()+1206<-kkqfppDrv()+184<-kkqfppPsh()+480<-kkqfppDrv1()+384<-kkqfppDrvDescendents()+1206<-kkqfppDrv()+184<-opitca()+1163<-kksFullTypeCheck()+69<-rpiswu2()+1776<-kksLoadChild()+11468<-kxsGetRuntimeLock()+2209<-kksfbc()+15108<-kkspsc0()+1171<-kksParseCursor()+116<-opiosq0()+2019<-kpooprx()+274<-kpoal8()+842<-opiodr()+915<-ttcpip()+2183<-opitsk()+1705<-opiino()+969<-opiodr()+915<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253

SO: 0x1200746ff8, type: 4, owner: 0x120000e978, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x120000e978, name=session, file=ksu.h LINE:12729, pg=0
(session) sid: 996 ser: 5 trans: (nil), creator: 0x120000e978
Current Wait Stack:
Not in wait; last wait ended 7 min 22 sec ago
There are 1 sessions blocked by this session.
Dumping one waiter:
inst: 1, sid: 827, ser: 1
wait event: 'library cache lock'
p1: 'handle address'=0x18bfd0cb80
p2: 'lock address'=0x191ff3f030
p3: '100*mode+namespace'=0x1004a0003

The user process can have a different stack as it depends on the parsing issue; e.g. Short stack dump:
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-qksqbCheckNameConflict()+51<-qksqbCreateName()+1459<-qksqbGenerateName()+1109<-kkqvmTrMrg()+2678<-kkqvmTrMrg()+15502<-kkqvmTrMrg()+10338<-kk

当时是因为前端进程一直在sql的parsing 阶段未成功, 取前端进程call stack, 根据parse特性尝试禁用解析器部分特性。

打赏

,

对不起,这篇文章暂时关闭评论。