记录一次内存不足使用SWAP 严重时的数据库表现。 oracle 11203 on linux 64
cpu core 8,物理内存:8g,问题当时物理内存使用完,swap 已使用4G,有配置hugepage,且 vmstat 查看swap in out 严重。负载60左右
Check oracle alert
2013-10-19 08:51:10.384000 +08:00 Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x10] [PC:0x938E8C5, dbgtrReserveSpace()+99] [flags: 0x0, count: 1] Errors in file /oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_13864.trc (incident=60801): ORA-07445: exception encountered: core dump [dbgtrReserveSpace()+99] [SIGSEGV] [ADDR:0x10] [PC:0x938E8C5] [Address not mapped to object] [] Incident details in: /oracle/diag/rdbms/ora11g/ora11g/incident/incdir_60801/ora11g_ora_13864_i60801.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. 2013-10-19 08:52:14.986000 +08:00 Sweep [inc][60837]: completed Sweep [inc][60836]: completed Sweep [inc][60835]: completed Sweep [inc][60834]: completed Sweep [inc][60833]: completed ... Sweep [inc][60839]: completed Sweep [inc][60838]: completed 2013-10-19 08:53:24.174000 +08:00 Errors in file /oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_14250.trc (incident=47873): ORA-00600: internal error code, arguments: [17147], [0x15A2005D0], [], [], [], [], [], [], [], [], [], [] Incident details in: /oracle/diag/rdbms/ora11g/ora11g/incident/incdir_47873/ora11g_ora_14250_i47873.trc ... 2013-10-19 09:10:01.732000 +08:00 WARNING: Heavy swapping observed on system in last 5 mins. 2013-10-19 09:10:13.915000 +08:00 pct of memory swapped in [8.73%] pct of memory swapped out [4.84%]. Please make sure there is no memory pressure and the SGA and PGA are configured correctly. Look at DBRM trace file for more details.
Incident details
vi /oracle/diag/rdbms/ora11g/ora11g/incident/incdir_60801/ora11g_ora_13864_i60801.trc
Dump continued from file: /oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_13864.trc
ORA-07445: exception encountered: core dump [dbgtrReserveSpace()+99] [SIGSEGV] [ADDR:0x10] [PC:0x938E8C5] [Address not mapped to object] []
========= Dump for incident 60801 (ORA 7445 [dbgtrReserveSpace()+99]) ========
----- Beginning of Customized Incident Dump(s) -----
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x10] [PC:0x938E8C5, dbgtrReserveSpace()+99] [flags: 0x0, count: 1]
Registers:
%rax: 0x00000000770000e4 %rbx: 0x0000000000000068 %rcx: 0x0000000000000068
%rdx: 0x00000001658f10c0 %rdi: 0x00002b25a44ed710 %rsi: 0x00007fff08087ae8
%rsp: 0x00007fff08087930 %rbp: 0x00007fff080879b0 %r8: 0x0000000000000068
%r9: 0x0000000000000001 %r10: 0x0000000000000068 %r11: 0x0000000000000000
%r12: 0x0000000000000000 %r13: 0x00007fff08087ae8 %r14: 0x00000001658f10c0
%r15: 0x0000000000000001 %rip: 0x000000000938e8c5 %efl: 0x0000000000010206
dbgtrReserveSpace()+83 (0x938e8b5) jnz 0x938f18c
dbgtrReserveSpace()+89 (0x938e8bb) mov (%r14),%r12
dbgtrReserveSpace()+92 (0x938e8be) or $64,%eax
dbgtrReserveSpace()+95 (0x938e8c1) mov %eax,0x8(%r14)
> dbgtrReserveSpace()+99 (0x938e8c5) mov 0x10(%r12),%r8
dbgtrReserveSpace()+104 (0x938e8ca) mov 0x18(%r12),%r9
dbgtrReserveSpace()+109 (0x938e8cf) mov %r9,%rax
dbgtrReserveSpace()+112 (0x938e8d2) sub %r8,%rax
dbgtrReserveSpace()+115 (0x938e8d5) movzwl %ax,%eax
----- Call Stack Trace -----
skdstdst()+36 call kgdsdst()
ksedst1()+98 call skdstdst()
ksedst()+34 call ksedst1()
dbkedDefDump()+2741 call ksedst()
ksedmp()+36 call dbkedDefDump()
ssexhd()+2458 call ksedmp()
__sighandler() call ssexhd()
dbgtrReserveSpace() signal __sighandler()
+99
dbgtTrcData_int()+3 call dbgtrReserveSpace()
18
kstipg()+740 call dbgtTrcData_int()
dbktbDefaultBucketA call kstipg()
lloc()+87
ksucrp()+1470 call dbktbDefaultBucketA
lloc()
opiino()+1477 call ksucrp()
opiodr()+916 call opiino()
opidrv()+565 call opiodr()
sou2o()+98 call opidrv()
opimai_real()+128 call sou2o()
ssthrdmain()+252 call opimai_real()
main()+196
Analyze listener.ora log
[oracle@dbserver89 trace]$ fgrep "19-OCT-2013 08:" listener.log_bak|fgrep "establish"|awk '{print $1 "" $2}'| awk -F: '{print $1 ":" $2}' |sort|uniq -c
...
114 19-OCT-201308:40
83 19-OCT-201308:41
57 19-OCT-201308:42
13 19-OCT-201308:43
84 19-OCT-201308:44
93 19-OCT-201308:45
95 19-OCT-201308:46
69 19-OCT-201308:47
71 19-OCT-201308:48
65 19-OCT-201308:49
52 19-OCT-201308:50
127 19-OCT-201308:51
39 19-OCT-201308:52
99 19-OCT-201308:53
39 19-OCT-201308:54
338 19-OCT-201308:55
785 19-OCT-201308:56
1229 19-OCT-201308:57
1122 19-OCT-201308:58
1190 19-OCT-201308:59
...
[root@dbserver89 sa]# sar -f sa19 -s 08:00:00 -e 10:00:00 Linux 2.6.18-92.el5 (dbserver89) 10/19/2013 08:00:01 AM CPU %user %nice %system %iowait %steal %idle 08:10:01 AM all 0.13 0.00 0.09 0.11 0.00 99.67 08:20:02 AM all 0.17 0.00 0.11 0.15 0.00 99.58 08:30:01 AM all 0.12 0.00 0.09 0.25 0.00 99.54 08:40:01 AM all 0.25 0.00 0.13 0.54 0.00 99.09 08:50:01 AM all 0.54 0.00 0.17 0.84 0.00 98.45 09:00:01 AM all 0.84 0.00 0.40 0.66 0.00 98.11 09:10:53 AM all 0.80 0.00 8.02 42.91 0.00 48.27 09:20:05 AM all 0.50 0.00 8.24 32.31 0.00 58.95 09:30:06 AM all 0.69 0.00 5.29 27.21 0.00 66.81 09:40:02 AM all 0.59 0.00 7.75 27.87 0.00 63.79 09:50:01 AM all 0.33 0.00 0.31 1.40 0.00 97.96 Average: all 0.45 0.00 2.78 12.28 0.00 84.49
Logon Storm
A Logon storm is a sudden spike in the number of client connection requests. Logon storms can occur due to a variety of factors. They could be malicious like a DoS attack. Or they could occur due to administrative actions – such as a middle tier coming online.
The process of starting a database session has inherent CPU costs associated with it – from the creation of a new OS process to attaching to the SGA. Every system has a limit above which the spawning of new database sessions can starve existing sessions of CPU resources. This could result in many unexpected consequences including application timeouts, increased response times, un-responsiveness and other cascading effects.
同时oracle 11g 默认是启用了 db audit 审计。
如果查询那个时间段的事件可以用下面的方法
SELECT snap_id,
sample_time,
session_type,
session_id,sql_opname,
sql_id,
event,
p1,
p2,
blocking_session,
blocking_hangchain_info,
current_obj#,
machine
FROM DBA_HIST_ACTIVE_SESS_HISTORY
WHERE sample_time BETWEEN TO_DATE ('2013-10-19 08:00', 'yyyy-mm-dd hh24:mi')
AND TO_DATE ('2013-10-19 10:00', 'yyyy-mm-dd hh24:mi')
ORDER BY 1,
会发现很多buffer busy wait 对象是aud$. 同时伴有resmgr:cpu quantum