首页 » ORACLE » Troubleshooting Active Dataguard Hangs waiting for library cache lock on DBINSTANCE namespace

Troubleshooting Active Dataguard Hangs waiting for library cache lock on DBINSTANCE namespace

Oracle 11.2.0.3 2nodes RAC on hpux环境, active dataguard端突然产生较大GAP,日志停止应用,大量前台查询进程等待”library cache load lock”, hanganalye 显示’rdbms ipc message'<=’library cache lock’ 等待链, library cache lock是一个parse lock,要知道在ADG 环境中也只有一个进程会在X-mode 持有lock,那就是LGWR,因为是只读库所以其它查询会话都是以S-mode parse查询。当在ADG环境看到library cache lock时通常发现在以下情况:primary site上做了DDL; 系统权限做了revoke. 而对于这个问题我们尝试重启日志应用进程都没有解决,无奈当时重启standby实例恢复,后分析确认在ADG环境中存在一个相关bug, 安装patch后就可以彻底解决,当然如果是primary 就是频繁的DDL,而导致standby只是争用这个patch可能无用。这里记录一下这个问题。

这类问题通常建议收集:

hanganalyze, systemstate dump, v$ash;

— hanganalye

Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (stdanbob.anbob1)
                   os id: 22931
              process id: 135, oracle@weejar1
              session id: 2479
        session serial #: 1571
    }
    is waiting for 'library cache lock' with wait info:
    {
                      p1: 'handle address'=0xc0000015bff7f678
                      p2: 'lock address'=0xc000001334cffad8
                      p3: '100*mode+namespace'=0x1004a0002
            time in wait: 21.353805 sec
           timeout after: 14 min 38 sec
                 wait id: 2
                blocking: 0 sessions
            wait history:
              * time between current wait and wait #1: 0.143108 sec
              1.       event: 'SQL*Net message from client'
                 time waited: 0.001206 sec
                     wait id: 1               p1: 'driver id'=0x54435000
                                              p2: '#bytes'=0x1
              * time between wait #1 and #2: 0.000013 sec
              2.       event: 'SQL*Net message to client'
                 time waited: 0.000001 sec
                     wait id: 0               p1: 'driver id'=0x54435000
                                              p2: '#bytes'=0x1
    }
    and is blocked by
 => Oracle session identified by:
    {
                instance: 1 (stdanbob.anbob1)
                   os id: 12817
              process id: 32, oracle@weejar1 (LGWR)
              session id: 11265
        session serial #: 1
    }
    which is waiting for 'rdbms ipc message' with wait info:
    {
                      p1: 'timeout'=0x12c
            time in wait: 2.488548 sec
      heur. time in wait: 8.508734 sec
           timeout after: 0.511452 sec
                 wait id: 586406071
                blocking: 7 sessions
            wait history:
              * time between current wait and wait #1: 0.000089 sec
              1.       event: 'rdbms ipc message'
                 time waited: 3.010034 sec
                     wait id: 586406070       p1: 'timeout'=0x12c
              * time between wait #1 and #2: 0.000065 sec
              2.       event: 'rdbms ipc message'
                 time waited: 3.009998 sec
                     wait id: 586406069       p1: 'timeout'=0x12c
              * time between wait #2 and #3: 0.000025 sec
              3.       event: 'rdbms ipc message'
                 time waited: 0.687558 sec
                     wait id: 586406068       p1: 'timeout'=0x44
    }

Chain 1 Signature: 'rdbms ipc message'<='library cache lock'

# SSD trace

Blockers
~~~~~~~~

        Above is a list of all the processes. If they are waiting for a resource
        then it will be given in square brackets. Below is a summary of the
        waited upon resources, together with the holder of that resource.
        Notes:
        ~~~~~
         o A process id of '???' implies that the holder was not found in the
           systemstate. (The holder may have released the resource before we
           dumped the state object tree of the blocking process).
         o Lines with 'Enqueue conversion' below can be ignored *unless* 
           other sessions are waiting on that resource too. For more, see 
           http://gbr30026.uk.oracle.com:81/Public/TOOLS/Ass.html#enqcnv)
         o You might see a process blocked on a mutex wait event but also 
           reported as holding the same mutex. You will need to check the 
           processstate dump as we might have been waiting at the start of the 
           process dump but have acquired it before the dump completed.

                    Resource Holder State
                 Mutex 13e4d    10: Blocker
                 Mutex 13e4d    39: Blocker
LOCK: Handle=c0000015bff7f678    32: waiting for 'rdbms ipc message'
      LOAD: c0000013bcdaf710    ??? Blocker
      LOAD: c0000013bcdab630   361: 361: is waiting for 32:
      LOAD: c0000013bcd70ac8   667: 667: is waiting for 32:
      LOAD: c0000013e2cfa070   918: 918: is waiting for 32:
      LOAD: c0000013beff2170   943: 943: is waiting for 32:
      LOAD: c0000013e2d60cd8   954: 954: is waiting for 32:
      LOAD: c0000013e2d60ef8   950: 950: is waiting for 32:
      LOAD: c0000013e2cfa290   704: 704: is waiting for 32:
      LOAD: c0000013e2e5e690   910: 910: is waiting for 32:
      LOAD: c0000013bde32768   650: 650: is waiting for 32:
      LOAD: c0000012fb399e90   988: 988: is waiting for 32:
      LOAD: c0000012fb6ef678    ??? Blocker
      LOAD: c0000013bcdf7880  1008: 1008: is waiting for 32:
      LOAD: c0000013bcdf7660  1014: 1014: is waiting for 32:

PID to SID Mapping
~~~~~~~~~~~~~~~~~~
Pid 10 maps to Sid(s): 3521
Pid 39 maps to Sid(s): 13729 13730
Pid 32 maps to Sid(s): 11265
Pid 361 maps to Sid(s): 14434
Pid 667 maps to Sid(s): 9522 9525
Pid 918 maps to Sid(s): 7770 7746
Pid 943 maps to Sid(s): 16554
Pid 954 maps to Sid(s): 20422
Pid 950 maps to Sid(s): 19019
Pid 704 maps to Sid(s): 27 1
Pid 910 maps to Sid(s): 4934 4953
Pid 650 maps to Sid(s): 3531 3532
Pid 988 maps to Sid(s): 9884
Pid 1008 maps to Sid(s): 16909
Pid 1014 maps to Sid(s): 19037

Warning: The following processes have multiple session state objects and
may not be properly represented above :
    42:   52:  273:                                                            

Object Names
~~~~~~~~~~~~
Mutex 13e4d                                                   
LOCK: Handle=c0000015bff7f678   CURSOR(00):SYS.anbob          
LOAD: c0000013bcdaf710                                        
LOAD: c0000013bcdab630                                        
LOAD: c0000013bcd70ac8                                        
LOAD: c0000013e2cfa070                                        
LOAD: c0000013beff2170                                        
LOAD: c0000013e2d60cd8                                        
LOAD: c0000013e2d60ef8                                        
LOAD: c0000013e2cfa290                                        
LOAD: c0000013e2e5e690                                        
LOAD: c0000013bde32768                                        
LOAD: c0000012fb399e90                                        
LOAD: c0000012fb6ef678                                        
LOAD: c0000013bcdf7880                                        
LOAD: c0000013bcdf7660                                        

Summary of Wait Events Seen (count>10)
~~~~~~~~~~~~~~~~~~~~~~~~~~~
  No wait events seen more than 10 times

  ----------------------------------------
  SO: 0xc0000016536b1cc8, type: 2, owner: 0x0000000000000000, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0xc0000016536b1cc8, name=process, file=ksu.h LINE:12616 ID:, pg=0
  (process) Oracle pid:32, ser:1, calls cur/top: 0xc0000016164719f8/0xc0000016164719f8
            flags : (0x6) SYSTEM
            flags2: (0x0),  flags3: (0x10)
            intr error: 0, call error: 0, sess error: 0, txn error 0
            intr queue: empty
    ksudlp FALSE at location: 0
  (post info) last post received: 0 0 35
              last post received-location: ksr2.h LINE:627 ID:ksrpublish
              last process to post me: c0000016236326c0 1 6
              last post sent: 0 0 36
              last post sent-location: ksr2.h LINE:631 ID:ksrmdone
              last process posted by me: c0000016236326c0 1 6
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: 0xc000001653f68b68
    O/S info: user: oracle, term: UNKNOWN, ospid: 12817
    OSD pid info: Unix process pid: 12817, image: oracle@kdyya1 (LGWR)
    Short stack dump:
ksedsts()+544<-ksdxfstk()+48<-ksdxcb()+3216<-sspuser()+688<-<-_pw_wait()+48<-pw_wait()+112<-sskgpwwait()+432<-skgpwwait()+320<-ksliwat()+3328<-kslwaitctx()+304<-kslwait()+192<-ksarcv()+640<-ks
babs()+752<-ksbrdp()+2736<-opirip()+1296<-opidrv()+1152<-sou2o()+256<-opimai_real()+352<-ssthrdmain()+576<-main()+336<-main_opd_entry()+80
    ----------------------------------------
	
    ----------------------------------------
    SO: 0xc0000016552ca578, type: 4, owner: 0xc0000016536b1cc8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0xc0000016536b1cc8, name=session, file=ksu.h LINE:12624 ID:, pg=0
    (session) sid: 11265 ser: 1 trans: 0x0000000000000000, creator: 0xc0000016536b1cc8
              flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
              flags2: (0x409) -/-/INC
              DID: , short-term DID:
              txn branch: 0x0000000000000000
              oct: 0, prv: 0, sql: 0x0000000000000000, psql: 0x0000000000000000, user: 0/SYS
    ksuxds FALSE at location: 0
    service name: SYS$BACKGROUND
    Current Wait Stack:
     0: waiting for 'rdbms ipc message'
        timeout=0xe7, =0x0, =0x0
        wait_id=586406182 seq_num=64592 snap_id=1
        wait times: snap=0.027546 sec, exc=0.027546 sec, total=0.027546 sec
        wait times: max=2.310000 sec, heur=0.027546 sec
        wait counts: calls=1 os=1
        in_wait=1 iflags=0x5a8
    There are 1 sessions blocked by this session.
    Dumping one waiter:
      inst: 1, sid: 20067, ser: 55
      wait event: 'library cache lock'
        p1: 'handle address'=0xc0000015bff7f678
        p2: 'lock address'=0xc0000013dad14ef8
        p3: '100*mode+namespace'=0x1004a0002
      row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0
      min_blocked_time: 12 secs, waiter_cache_ver: 16092
    Wait State:
      fixed_waits=0 flags=0x22 boundary=0x0000000000000000/-1
    Session Wait History:

  
    ----------------------------------------
    SO: 0xc0000016164719f8, type: 3, owner: 0xc0000016536b1cc8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0xc0000016536b1cc8, name=call, file=ksu.h LINE:12620 ID:, pg=0
    (call) sess: cur c0000016552ca578, rec 0, usr c0000016552ca578; flg:20 fl2:1; depth:0
    svpt(xcb:0x0000000000000000 sptn:0x2 uba: 0x00000000.0000.00)
    ksudlc FALSE at location: 0
      ----------------------------------------
      SO: 0xc0000015cff543b8, type: 78, owner: 0xc0000016164719f8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
       proc=0xc0000016536b1cc8, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:8547 ID:, pg=0

      LibraryObjectLock:  Address=c0000015cff543b8 Handle=c0000015bff7f678 Mode=X CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0

        User=c0000016552ca578 Session=c0000016552ca578 ReferenceCount=1 Flags=CNB/[0001] SavepointNum=2
      LibraryHandle:  Address=c0000015bff7f678 Hash=d68d5950 LockMode=X PinMode=0 LoadLockMode=0 Status=0
        ObjectName:  Name=SYS.anbob
          FullHashValue=93b41a89a2321a9b51ff1c32d68d5950 Namespace=DBINSTANCE(74) Type=CURSOR(00) Identifier=1 OwnerIdn=0
        Statistics:  InvalidationCount=0 ExecutionCount=0 LoadCount=0 ActiveLocks=1 TotalLockCount=3424094 TotalPinCount=0
        Counters:  BrokenCount=1 RevocablePointer=1 KeepDependency=0 BucketInUse=7020 HandleInUse=7020 HandleReferenceCount=0
        Concurrency:  DependencyMutex=c0000015bff7f728(0, 0, 0, 0) Mutex=c0000015bff7f7a8(0, 7048705, 1949, 0)
        Flags=RON/PIN/KEP/BSO/[00810003]
        WaitersLists:
          Lock=c0000015bff7f708[c0000013dad14f68,c0000013dad14f68]
          Pin=c0000015bff7f6e8[c0000015bff7f6e8,c0000015bff7f6e8]
          LoadLock=c0000015bff7f760[c0000015bff7f760,c0000015bff7f760]
        Timestamp:
        HandleReference:  Address=c0000015bff7f818 Handle=0000000000000000 Flags=[00]

以上面这个信息不能在MOS中匹配到Bug 16717701 ,在它的超集bug 17018214也有记录,从SSD中可以看到LGWR holding “DBINSTANCE” namespace library cache lock in X-mode, 像MOS中提示的Cancel the media recovery and restart临时解决方法在我们这个例子是不可行的,但是有测试cancel后,flush 所有实例的shared pool再次应用日志有时可以解决。 彻底的解决方法是在standby db安装patch 17018214,注意安装patch后所有版本默认fix是Disable的,需要使用event 16717701使修复生效,level 是一个参数值,在readme中也有介绍该值的计算方法, 如果在没有安装oneoff patch 启用event 16717701不报错,但是同样不会有任何作用。

For backports/one-off patches, the fix must be enabled by setting event 16717701 at level 104887600:

alter system set EVENT='16717701 trace name context forever, level 104887600' scope=spfile;

The 104887600 value of the event encodes two things:
1. Timeout – this is the amount of time that LGWR will wait for an X lock before signaling a timeout error and retrying.
2. Sleep duration – this is the amount of time that LGWR will sleep for after having timed out.

Both of the above values are denoted in MILLISECONDS. The event value encodes the sleep duration in its 12high-order bits and the timeout in the 20 lower order bits. The value can be calculated using the following formula:

value = (S * 1048576) + T

where S = sleep duration in milliseconds
T = timeout in milliseconds

当然这个值可以进行微调,根据oracle的建议是开始配置30 second timeout 和 100 ms sleep duration,也就是说LGWR会等待30秒获取X-lock, 如时失败会在sleeping 100秒秒后重试,使用上面的计算方法,event的level值计算就是:

value = (100 * 1048576) + 30000 = 104887600

在 RAC ADG 环境中这个event必须在所有使用使用相同的值,而且不支持滚动修改。

这个bug在11.2.0.4 和 12c已修复,同时引入了隐藏参数控制 “_adg_parselock_timeout”和_adg_parselock_timeout_sleep .

SQL> @pd parselock
Show all parameters and session values from x$ksppi/x$ksppcv...

       NUM N_HEX NAME                               VALUE        DESCRIPTION
---------- ----- ---------------------------------- ------------ ------------------------------------------------------------------
      2345   929 _adg_parselock_timeout             0            timeout for parselock get on ADG in centiseconds
      2347   92B _adg_parselock_timeout_sleep       100          sleep duration after a parselock timeout on ADG in milliseconds

The value of 550 centiseconds has been typically enough (550*0.01=5.5 seconds): “_adg_parselock_timeout”=550;
it can be adjusted to the recommended value of 30 seconds; “_adg_parselock_timeout”= 3000.

 

打赏

目前这篇文章还没有评论(Rss)

我要评论