最近遇到一套9I数据库遇到了性能问题, 现象是数据库主机CPU使用率很高应用响应缓慢,Cpu Idel几乎为0, 从v$session_wait查看数据库当前的活动会话在等待’null event’和’latch free’.
anbob% uname -a SunOS anbob 5.10 Generic_139555-08 sun4u sparc SUNW,SPARC-Enterprise anbob% sqlplus "/ as sysdba" SQL*Plus: Release 9.2.0.1.0 - Production on Fri Nov 11 11:35:34 2016 Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved. Connected to: Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production With the Partitioning, OLAP and Oracle Data Mining options JServer Release 9.2.0.1.0 - Production SQL> select * from v$version; BANNER ---------------------------------------------------------------- Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production PL/SQL Release 9.2.0.1.0 - Production CORE 9.2.0.1.0 Production TNS for Solaris: Version 9.2.0.1.0 - Production NLSRTL Version 9.2.0.1.0 - Production SQL> select event,count(*) from v$session_wait group by event; EVENT COUNT(*) ---------------------------------------------------------------- ---------- SQL*Net message from client 66 db file scattered read 2 latch free 21 null event 83 pmon timer 1 rdbms ipc message 5 smon timer 1 wakeup time manager 1
Note:
对于null event通常是数据库暂时未最到等待事件的命名,原因可能是当时的CPU负载高,数据库HANG,bug, 本案例应该是因为latch导致的CPU耗尽,所以解决latch free先。
latch是一种访问共享资源时的内部内存结构,通常latch的持有非常的短暂,又因为latch spin非常的消耗CPU,sleeping阶段不会消耗CPU time但同时会显示latch free的wait event.
通常对于latch free需要确认相关的具体的latch和相关的SQL及涉及的对象。如果感兴趣推荐阅读去搜索。 对于latch free的三个参数如下:
P1 = Latch address
P2 = Latch number
P3 = Tries
SQL> select name from v$latchname where latch#=97;
NAME
----------------------------------------------------------------
cache buffers chains
SQL> select sql_hash_value,count(*) from v$session_wait a,v$session b where a.sid=b.sid
and event='latch free' group by sql_hash_value;
SQL_HASH_VALUE COUNT(*)
--------------- ---------------
...
1993732470 1
2081869776 1
2349207484 2
2380084478 2
2695258969 1
2907909916 1
3165338618 1
3442647599 1
3655775635 1
3756755013 3
3790812144 1
3936368692 1
3980632519 2
4005557655 1
4161039002 1
4249511952 1
4274466022 2
32 rows selected.
SQL> select p1,count(*) from v$session_wait a where event='latch free' group by p1;
P1 COUNT(*)
--------------- ---------------
16162378384 1
16162683952 9
16162801168 13
16162947472 23
select to_char(16162947472,'xxxxxxxxxxxxxxx') from dual;
TO_CHAR(16162947
----------------
3c3630190
SQL> select * from v$latch_children where addr=upper(lpad('3c3630190',16,0));
ADDR LATCH# CHILD# LEVEL# NAME GETS MISSES SLEEPS IMMEDIATE_GETS IMMEDIATE_MISSES
---------------- -------- ------- ---------- ---------------------- --------- --------- -------- --------------- ----------------
00000003C3630190 97 2949 1 cache buffers chains 294507 499 373 8804 3
-- wait 1s
SQL> /
ADDR LATCH# CHILD# LEVEL# NAME GETS MISSES SLEEPS IMMEDIATE_GETS IMMEDIATE_MISSES
---------------- -------- ------- ---------- ---------------------- --------- --------- -------- --------------- ----------------
00000003C3630190 97 2949 1 cache buffers chains 300406 503 373 8925 3
SQL> /
ADDR LATCH# CHILD# LEVEL# NAME GETS MISSES SLEEPS IMMEDIATE_GETS IMMEDIATE_MISSES
---------------- -------- ------- ---------- ---------------------- --------- --------- -------- --------------- ----------------
00000003C3630190 97 2949 1 cache buffers chains 300966 504 373 8939 3
SQL> /
ADDR LATCH# CHILD# LEVEL# NAME GETS MISSES SLEEPS IMMEDIATE_GETS IMMEDIATE_MISSES
---------------- -------- ------- ---------- ---------------------- --------- --------- -------- --------------- ----------------
00000003C3630190 97 2949 1 cache buffers chains 301324 504 373 8948 3
SQL>
select hladdr, file#, dbablk, decode(state,1,'cur ',3,'CR',state) ST, tch
from x$bh where hladdr in
(select addr from (select addr from v$latch_children where addr=upper(lpad('3c3630190',16,0))
order by sleeps, misses,immediate_misses desc )where rownum <2) order by tch;
HLADDR FILE# DBABLK ST TCH
---------------- ---------- ---------- ---------------------------------------- ----------
...
...
00000003C3630190 16 436368 cur 156
00000003C3630190 17 491740 cur 157
00000003C3630190 11 257281 cur 160
00000003C3630190 16 706704 cur 161
00000003C3630190 12 468301 cur 162
00000003C3630190 12 759117 cur 162
00000003C3630190 12 1017696 cur 169
00000003C3630190 11 204033 cur 170
00000003C3630190 17 475356 cur 170
00000003C3630190 12 226637 cur 174
00000003C3630190 18 248104 cur 178
00000003C3630190 12 128333 cur 179
00000003C3630190 16 1169552 cur 185
00000003C3630190 16 727184 cur 190
00000003C3630190 12 173389 cur 194
00000003C3630190 12 857952 cur 197
00000003C3630190 12 578893 cur 214
00000003C3630190 12 1029984 cur 226
00000003C3630190 18 5909 cur 228
00000003C3630190 16 1255568 cur 410
00000003C3630190 18 157992 cur 422
00000003C3630190 19 119156 cur 468
00000003C3630190 17 241884 cur 490
00000003C3630190 12 1156960 cur 508
00000003C3630190 16 907408 cur 600
00000003C3630190 12 1009504 cur 1051 <<<
125 rows selected.
SQL> select owner, segment_name,partition_name, segment_type, file_id,block_id from dba_extents
where file_id = 12 and 1009504 between block_id and block_id+blocks-1;
OWNER SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE FILE_ID BLOCK_ID
------------- -------------------------- ---------------- ------------------ ---------- ----------
LTEOMS IDX_WFWI_STATE INDEX 12 1008521
SQL> select /*+ RULE */ substr(a.event,1,25) ,substr(b.program,1,39) program,
b.username,substr(b.osuser||'@'||b.machine||'@'||b.process||'@'||c.spid,1,31) client,
to_char(b.logon_time,'mm-dd hh24:mi') logon_time
from v$session_wait a,v$session b, v$process c
where a.sid=b.sid and b.paddr=c.addr
and a.event ='latch free';
PROGRAM USERNAME CLIENT LOGON_TIME
--------------------------- --------------- --------------------------- -----------
JDBC Thin Client LTEOMS root@hneoms@@2172 11-11 10:16
JDBC Thin Client LTEOMS root@hneoms@@2394 11-11 10:27
JDBC Thin Client LTEOMS root@hneoms@@2222 11-11 10:18
JDBC Thin Client LTEOMS root@hneoms@@1998 11-11 10:09
JDBC Thin Client LTEOMS root@hneoms@@1974 11-11 10:07
JDBC Thin Client LTEOMS root@hneoms@@1915 11-11 10:05
6 rows selected.
select hash_value,sql_text from v$sql where hash_value in(
1045805320,
1081719454,
1542141426,
1663684681,
1759938066,
1826030977,
1906825844,
1993732470,
2081869776,
2349207484,
2380084478,
2695258969,
2907909916,
3165338618,
3442647599,
3655775635,
3756755013,
3790812144,
3936368692,
3980632519,
4005557655,
4161039002,
4249511952,
4274466022
);
SELECT COUNT(*) AMOUNT FROM (SELECT DISTINCT ADVISE_NUM adviseNum,REPLY_NUM replyNum,HASTEN_NUM hastenNum,MAIN_ID mainId,TITLE title,SHEET_ID sheetId,STATUS status,SEND_TIME sendTime,SEND_DEPT_ID sendDeptId,SEND_USER_ID sendUserId,LINK_ID linkId,TASK_ACCEPT_LIMIT taskAcceptLimit,TASK_COMPLETE_LIMIT
taskCompleteLimit,TASK_TYPE taskType,START_TIME startTime,END_TIME endTime,ACTIVITY_DEF_ID activityDefId,ACTIVITY_INST_NAME activityInstName,ACTIVITY_INST_ID activityInstId,PROCESS_INST_ID processInstId,PROCESS_INST_NAME processInstName,PROCESS_DEF_ID processDefId,PROCESS_DEF_NAME processDefName,WO
...
...
24 rows selected.
Note:
根据p2的值确认了是cache buffers chains latch,通常出现在hot block的SQL中,并且只有gets有明显的增长,通常从x$bh的tch列可以确认对象为一索引对象,并且有确认了相的SQL基本相同只是字面量值不同,这个sql较长这里之截取了其中一部分,因为v$sql在9i中不存在sqlfull_text, 可以写PLSQL 去v$sqltext中拼全SQL语句, 继续可以从SQL的执行计划与hot block的对象交集进行判断优化。因为9i中没有dbms_xplan package查看执行计划,这里我调用了TanelPoder的脚本。因为该SQL的执行计划有约1000行,我们可以先看与hot block 索引相关的执行计划。
SQL> @xm 3756755013 0
Ch Pr Op Objcect Optimizer Optim rows Optim bytes Optimizer
ld ed ID Operation Name Cost from step from step Mode
-- -- ---- ------------------------------------------------------- ------------------------------ --------- ---------- ----------- ----------
0 0 SELECT STATEMENT 13222 CHOOSE
1 SORT AGGREGATE 1
2 VIEW 13222 64
3 SORT UNIQUE 13222 64 44736
4 VIEW 13268 64 44736
5 SORT UNIQUE 13268 64 46142
6 UNION-ALL
7 CONCATENATION
8 NESTED LOOPS 44 1 723
AF 9 HASH JOIN 43 1 697
10 NESTED LOOPS 40 1 321
11 INLIST ITERATOR
F 12 TABLE ACCESS BY INDEX ROWID WFWIPARTICIPANT 6 314 26376
A 13 INDEX RANGE SCAN WF_IDX_PART_TYPE 2 12546
F 14 TABLE ACCESS BY INDEX ROWID WFWORKITEM 40 1799 426363
A 15 INDEX UNIQUE SCAN PK_WFWORKITEM 815 1
16 TABLE ACCESS FULL WS_CIRCUITSHEET_TASK 2 409 153784
F 17 TABLE ACCESS BY INDEX ROWID WS_CIRCUITSHEET_MAIN 1 1 26
A 18 INDEX UNIQUE SCAN WS_PK_CIRCUITSHEET_MAIN 102
19 NESTED LOOPS 44 1 723
AF 20 HASH JOIN 43 1 697
21 TABLE ACCESS BY INDEX ROWID WFWORKITEM 40 1799 426363
22 NESTED LOOPS 40 1 321
23 INLIST ITERATOR
F 24 TABLE ACCESS BY INDEX ROWID WFWIPARTICIPANT 6 314 26376
A 25 INDEX RANGE SCAN WF_IDX_PART_TYPE 2 12546
26 BITMAP CONVERSION TO ROWIDS
27 BITMAP AND
28 BITMAP CONVERSION FROM ROWIDS
A 29 INDEX RANGE SCAN PK_WFWORKITEM 1
30 BITMAP CONVERSION FROM ROWIDS
A 31 INDEX RANGE SCAN IDX_WFWI_STATE 1 1 <<<<<<<<<
32 TABLE ACCESS FULL WS_CIRCUITSHEET_TASK 2 409 153784
F 33 TABLE ACCESS BY INDEX ROWID WS_CIRCUITSHEET_MAIN 1 1 26
A 34 INDEX UNIQUE SCAN WS_PK_CIRCUITSHEET_MAIN 102
35 CONCATENATION
36 NESTED LOOPS 141 1 786
AF 37 HASH JOIN 140 1 760
38 NESTED LOOPS 40 1 321
39 INLIST ITERATOR
F 40 TABLE ACCESS BY INDEX ROWID WFWIPARTICIPANT 6 314 26376
A 41 INDEX RANGE SCAN WF_IDX_PART_TYPE 2 12546
F 42 TABLE ACCESS BY INDEX ROWID WFWORKITEM 40 10 2370
A 43 INDEX UNIQUE SCAN PK_WFWORKITEM 815 1
44 TABLE ACCESS FULL WS_CUTMANAGESHEET_TASK 86 71389 31339771
F 45 TABLE ACCESS BY INDEX ROWID WS_CUTMANAGESHEET_MAIN 1 1 26
A 46 INDEX UNIQUE SCAN WS_PK_CUTMANAGESHEET_MAIN 100
47 NESTED LOOPS 141 1 786
AF 48 HASH JOIN 140 1 760
49 TABLE ACCESS BY INDEX ROWID WFWORKITEM 40 10 2370
50 NESTED LOOPS 40 1 321
51 INLIST ITERATOR
F 52 TABLE ACCESS BY INDEX ROWID WFWIPARTICIPANT 6 314 26376
A 53 INDEX RANGE SCAN WF_IDX_PART_TYPE 2 12546
54 BITMAP CONVERSION TO ROWIDS
55 BITMAP AND
56 BITMAP CONVERSION FROM ROWIDS
A 57 INDEX RANGE SCAN PK_WFWORKITEM 1
58 BITMAP CONVERSION FROM ROWIDS
A 59 INDEX RANGE SCAN IDX_WFWI_STATE 1 1
60 TABLE ACCESS FULL WS_CUTMANAGESHEET_TASK 86 71389 31339771
F 61 TABLE ACCESS BY INDEX ROWID WS_CUTMANAGESHEET_MAIN 1 1 26
A 62 INDEX UNIQUE SCAN WS_PK_CUTMANAGESHEET_MAIN 100
...
...
A 837 INDEX RANGE SCAN IX_HNRESONE_T_WORKID 1 1
F 838 TABLE ACCESS BY INDEX ROWID WS_HNRESONE_MAIN 1 53 1378
A 839 INDEX UNIQUE SCAN WS_PK_HNRESONE_MAIN 100
Ch Op
ld ID Predicate Information (identified by operation id):
-- ------ ----------------------------------------------------------------------------------------------------
0 9 - access("WORKITEM_ID"="WORKITEMID")
9 - filter("CURRENTSTATE"=4 OR "CURRENTSTATE"=10 OR "PROLONG_READ_FLAG">0 OR "PHASE_BACK_READ_FLAG">0)
12 - filter("PARTIINTYPE"='EXE' OR "PARTIINTYPE"='GET')
13 - access("PARTICIPANTTYPE"='role' AND ("PARTICIPANT"='8a9580bc22fe4d430123037a68e208fd' OR
"PARTICIPANT"='8a9580bc29781f21012981a824492b13'))
14 - filter("CURRENTSTATE"="WORKITEMSTATE")
15 - access("WORKITEMID"="WORKITEMID")
17 - filter("STATUS"=2)
18 - access("MAIN_ID"="MAIN_ID")
20 - access("WORKITEM_ID"="WORKITEMID")
20 - filter("CURRENTSTATE"=4 OR "CURRENTSTATE"=10 OR "PROLONG_READ_FLAG">0 OR "PHASE_BACK_READ_FLAG">0)
24 - filter(("PARTIINTYPE"='EXE' OR "PARTIINTYPE"='GET') AND (LNNVL("PARTICIPANTTYPE"='role') OR
LNNVL("PARTICIPANT"='8a9580bc22fe4d430123037a68e208fd') AND
LNNVL("PARTICIPANT"='8a9580bc29781f21012981a824492b13')))
25 - access(("PARTICIPANTTYPE"='person' AND "PARTICIPANT"='wangqm' OR "PARTICIPANTTYPE"='organization'
AND "PARTICIPANT"='1238'))
29 - access("WORKITEMID"="WORKITEMID")
31 - access("CURRENTSTATE"="WORKITEMSTATE")
33 - filter("STATUS"=2)
34 - access("MAIN_ID"="MAIN_ID")
37 - access("WORKITEM_ID"="WORKITEMID")
37 - filter("CURRENTSTATE"=4 OR "CURRENTSTATE"=10 OR "PROLONG_READ_FLAG">0 OR "PHASE_BACK_READ_FLAG">0)
40 - filter("PARTIINTYPE"='EXE' OR "PARTIINTYPE"='GET')
41 - access("PARTICIPANTTYPE"='role' AND ("PARTICIPANT"='8a9580bc22fe4d430123037a68e208fd' OR
"PARTICIPANT"='8a9580bc29781f21012981a824492b13'))
42 - filter("CURRENTSTATE"="WORKITEMSTATE")
43 - access("WORKITEMID"="WORKITEMID")
45 - filter("STATUS"=2)
46 - access("MAIN_ID"="MAIN_ID")
48 - access("WORKITEM_ID"="WORKITEMID")
48 - filter("CURRENTSTATE"=4 OR "CURRENTSTATE"=10 OR "PROLONG_READ_FLAG">0 OR "PHASE_BACK_READ_FLAG">0)
52 - filter(("PARTIINTYPE"='EXE' OR "PARTIINTYPE"='GET') AND (LNNVL("PARTICIPANTTYPE"='role') OR
LNNVL("PARTICIPANT"='8a9580bc22fe4d430123037a68e208fd') AND
LNNVL("PARTICIPANT"='8a9580bc29781f21012981a824492b13')))
53 - access(("PARTICIPANTTYPE"='person' AND "PARTICIPANT"='wangqm' OR "PARTICIPANTTYPE"='organization'
AND "PARTICIPANT"='1238'))
57 - access("WORKITEMID"="WORKITEMID")
59 - access("CURRENTSTATE"="WORKITEMSTATE")
61 - filter("STATUS"=2)
62 - access("MAIN_ID"="MAIN_ID")
...
Note:
在使用到索引的位置使用的是nested-loop join, 而且多处出现,这样该语句自身就增加了buffer在访问该索引时,结果就是导致更高的latch CBC竞争,另外该索引是在BITMAP CONVERSION中使用,通常是要么是创建了不科学的索引,如在selectivity 较低的列创建了索引,导致使用了错误的执行计划,如改变两索引为组合索引解决,或通过_b_tree_bitmap_plans参数考虑在SQL,SESSION,SYSTEM level禁用普通tree索引转换bitmap的特性。