首页 » ORACLE » library cache: mutex X等待事件, blocker session on cpu

library cache: mutex X等待事件, blocker session on cpu

library cache: mutex X等待事件是当一个会话以 exclusive mode持有library cache mutex时,另一个会话当前无论何时申请此mutex时必须等待其释放。很多时候对 library cache做不同的操作时都需要申请一个mutex, 所以最重要的是确认mutex的位置”location”,  该位置有助于分析该类等待事件的原因。

对应参数:

P1 = “idn” = Unique Mutex Identifier
P2 = Mutex “value” = in order to find the mutex value, read note 1298015.1
P3 = “where” = location in code (internal identifier) where mutex is being waited for

mutex的问题排查较为复杂,通常一个现象有多个原因,并且mutex问题又会引起其它级连问题。常见原因:

  • 频繁的hard parse
  • High sql version count,需要检查一个很长的chain of versions
  • 引为某些原因导致的Invalidations and reloads
  • shared pool配置过小
  • cursor_sharing=similar和session_cached_cursors配置不当
  • BUGs

该类问题时通常需要引集的信息:

• AWR Report
• ASH Report
• Hanganalyze trace
• Systemstate dump (contain call stack level)

系统级可用的相关视图:GV$MUTEX_SLEEP 和GV$MUTEX_SLEEP_HISTORY

分析该事件的相关SQL:

-- which library cache objects are the target of the operation.
select * 
from 
   (select 
      case when (kglhdadr = kglhdpar) 
      then 'Parent' 
      else 'Child '||kglobt09 end cursor,  
      kglhdadr ADDRESS, 
      substr(kglnaobj,1,20) NAME, 
      kglnahsh HASH_VALUE, 
      kglobtyd TYPE,  
      kglobt23 LOCKED_TOTAL, 
      kglobt24 PINNED_TOTAL,
      kglhdexc EXECUTIONS, 
      kglhdnsp NAMESPACE  
   from 
      x$kglob 
     -- where kglobtyd != 'CURSOR' 
   order by 
      kglobt24 desc) 
where 
   rownum <= 10; 

-- to find blocker session 
SELECT count (*), to_number(substr(to_char(rawtohex(p2raw)), 1, 8),'XXXXXXXX') blocking_sid 
FROM v$session 
WHERE event = 'library cache: mutex X' 
group by to_number(substr(to_char(rawtohex(p2raw)),1, 8), 'XXXXXXXX'); 

-- to find mutex from ash 
select event, p1, count(1) 
from v$active_session_history 
where sample_time > (sysdate - 20/1440)
and   event = 'library cache: mutex X' 
group by 
   event, p1 
order by 3;  

最近遇到过一个案例数据库几个session 在执行相同的insert values sql时library cache: mutex X争用严重,其它SQL和session正常,SQL文本中使用了46个绑定变量值。

SQL> select child_number,count(*)  from v$sql where sql_id='b92jmb1qngsyd' group by child_number;

CHILD_NUMBER   COUNT(*)
------------ ----------
           0         47
           1         47
           2         47
           3         47
           4         47
           5         46
           6         46
           7         47
           8         48
           9         46
          10         47
          11         47
          12         47
          13         47
          14         47
          15         48
          16         47
          17         47
 ...
          83         47
          84         47
          85         47
          86         47
          87         47
          88         47
          89         47
          90         47
          91         47
          92         47
          93         46
          94         47
          95         47
          96         47
          97         47
          98         47
          99         47

100 rows selected.

sql> @no_shared

SQL_ID        NONSHARED_REASON                COUNT(*)
------------- ----------------------------- ----------
b92jmb1qngsyd BIND_MISMATCH                       4730
b92jmb1qngsyd PURGED_CURSOR                          2
b92jmb1qngsyd HASH_MATCH_FAILED                      1
b92jmb1qngsyd BIND_LENGTH_UPGRADEABLE              952

select /*+rule*/m.position,m.bind_name , m.max_length,count(*) child_cursor_count
     from v$sql s, v$sql_bind_metadata m
     where s.sql_id =  'b92jmb1qngsyd'
     and s.child_address = m.address group by m.position,m.bind_name , m.max_length
     order by 1, 2;

-- 确认存在部分变量varchar长度区间问题

SQL> select mutex_type,location_id,location,sleeps from x$mutex_sleep where location_id=85;

MUTEX_TYPE                       LOCATION_ID LOCATION                                     SLEEPS
-------------------------------- ----------- ---------------------------------------- ----------
Library Cache                             85 kgllkdl1  85                               15001254

当时有做SSD 但级别不够(10), 确认了blocker 是no in wait, 并且是on cpu, 执行的是相同的insert sql, 该sql是应用同步数据使用执行频率较高。当时也有尝试flush shared pool无效果,后尝试切换另一节点临时解决该现象。

因为当时也有做errorstack

  ----- Abridged Call Stack Trace -----
146297  ksedsts()+544<-kjzdssdmp()+400<-kjzdpcrshnfy()+512<-kstdmp()+416<-dbkedDefDump()+6032<-ksedmp()+64<-ksdxfdmp()+1360<
-ksdxcb()+3216<-sspuser()+688<-<kernel><-kglrdtin()+1121<-kglrdti()+80<-kksAllocCursorStat()+1152<-kksLoadChild()+14400<-kkslod()+112<-kglobld()+1872
146298  <-kglobpn()+1744<-kglpim()+832<-kglpin()+2496<-kxsGetRuntimeLock()+1840
146299  
----- End of Abridged Call Stack Trace -----

以call stack和event为关键字,在MOS中发现一篇Session Spin on Kglrdtin Holding ‘Library Cache: Mutex X’ (文档 ID 2219897.1)
和当前的版本现象较为相似因当前的平台hpux无相关one off patch,且无升级11.2.0.4计划记划,这里只记录一下该问题。

打赏

对不起,这篇文章暂时关闭评论。