首页 » ORACLE 9i-23c » Troubleshooting Instance crash after ora-7445[opiaba],ora-600 [kgh_heap_sizes:ds], ora-600 [17147]

Troubleshooting Instance crash after ora-7445[opiaba],ora-600 [kgh_heap_sizes:ds], ora-600 [17147]

最近有一套数据库 crash ,环境2nodes RAC 11.2.0.3 on AIX , alert日志中记录了在crash前出现了ora-7445[opiaba],ora-600 [kgh_heap_sizes:ds], ora-600 [17147], 下面记录当时的一些日志。

# DB alert log

Wed Sep 16 23:44:23 2015
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0xF00000E161B7CD2] [PC:0x10662AFD4, opiaba()+788] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_ora_21759124.trc  (incident=191931):
ORA-07445: 出现异常错误: 核心转储 [opiaba()+788] [SIGSEGV] [ADDR:0xF00000E161B7CD2] [PC:0x10662AFD4] [Address not mapped to object] []
Incident details in: /u01/app/oracle/diag/rdbms/anbob/anbob1/incident/incdir_191931/anbob1_ora_21759124_i191931.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Sep 16 23:46:37 2015
Errors in file /u01/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_ora_21759124.trc  (incident=191932):
ORA-00600: 内部错误代码, 参数: [kgh_heap_sizes:ds], [0x700000E82B30018], [], [], [], [], [], [], [], [], [], []
ORA-07445: 出现异常错误: 核心转储 [opiaba()+788] [SIGSEGV] [ADDR:0xF00000E161B7CD2] [PC:0x10662AFD4] [Address not mapped to object] []
Incident details in: /u01/app/oracle/diag/rdbms/anbob/anbob1/incident/incdir_191932/anbob1_ora_21759124_i191932.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Sep 16 23:47:14 2015
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Sep 16 23:47:14 2015
Dumping diagnostic data in directory=[cdmp_20150916234714], requested by (instance=1, osid=21759124), summary=[incident=191932].
Wed Sep 16 23:56:58 2015
Errors in file /u01/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_ora_14549208.trc  (incident=203370):
ORA-04030: out of process memory when trying to allocate 103504 bytes (QERHJ hash-joi,kllcqc:kllcqslt)
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Sep 16 23:57:02 2015
Sweep [inc][203370]: completed
Sweep [inc][191932]: completed
Sweep [inc][191931]: completed
Wed Sep 16 23:58:03 2015
Process 0x700000fd0ac9dc8 appears to be hung while dumping
Current time = 2507987599, process death time = 2507922725 interval = 60000
Attempting to kill process 0x700000fd0ac9dc8 with OS pid = 21759124
OSD kill succeeded for process 700000fd0ac9dc8
ERROR: Unable to normalize symbol name for the following short stack (at offset 521):
dbgexProcessError()+180<-dbgeExecuteForError()+72<-dbgePostErrorKGE()+2048<-dbkePostKGE_kgsf()+68<-kgeade()+364<-kgeriv_int()+116<-kgeriv()+36<-kgeric1()+60<-kgherror()+324<-kghfrmrg()+1964<-kghfre()+368<-kghfrh_internal()+552<-kghfrh()+16<-kksFreeHeap()+272<-kksLockDelete()+760<-kksLockOperation()+140<-kgllkdl()+3136<-kgllkds()+104<-kssxdl()+1872<-kssdel()+724<-kssdch()+5732<-ksuxds()+940<-kssxdl()+472<-kssdel()+724<-kssdch()+5732<-ksudlp()+792<-kssxdl()+472<-kssdel()+724<-ksuxdl()+456<-ksuxda_del_proc()+188<-IPRA.$ksucln_dpc_cleanup()+272<-IPRA.$ksucln_dpc_dfs()+268<-ksucln_dpc_main()+892<-ksucln_dpc()+36<-ksucln()+1340<-ksbrdp()+2044<-opirip()+1624<-opidrv()+608<-sou2o()+136<-opimai_real()+188<-ssthrdmain()+268<-main()+204<-__start()+112
Errors in file /u01/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_pmon_16253104.trc  (incident=190123):
ORA-00600: internal error code, arguments: [17147], [0x700000E82B30000], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/anbob/anbob1/incident/incdir_190123/anbob1_pmon_16253104_i190123.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Sep 16 23:58:12 2015
Dumping diagnostic data in directory=[cdmp_20150916235812], requested by (instance=1, osid=16253104 (PMON)), summary=[incident=190123].
Errors in file /u01/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_pmon_16253104.trc:
ORA-00600: internal error code, arguments: [17147], [0x700000E82B30000], [], [], [], [], [], [], [], [], [], []
PMON (ospid: 16253104): terminating the instance due to error 472
Wed Sep 16 23:58:12 2015
opiodr aborting process unknown ospid (15336054) as a result of ORA-1092
Wed Sep 16 23:58:14 2015
ORA-1092 : opitsk aborting process
Wed Sep 16 23:58:18 2015
Instance terminated by PMON, pid = 16253104

# anbob1_pmon_16253104.trc

*** TRACE FILE RECREATED AFTER BEING REMOVED ***

Process 0x700000fd0ac9dc8 appears to be hung while dumping
Current time = 2507987599, process death time = 2507922725 interval = 60000
Attempting to kill process 0x700000fd0ac9dc8 with OS pid = 21759124
OSD kill succeeded for process 700000fd0ac9dc8

*** 2015-09-16 23:58:06.766
KGX cleanup...
KGX Atomic Operation Log 70000101e762520
 Mutex 700000ea6aeec70(1300, 0) idn 49899eb8 oper EXCL
 Cursor Parent uid 1300 efd 11 whr 17 slp 0
 oper=OPERATION_DEFAULT pt1=700000ea6aef098 pt2=700000ea6aeebf8 pt3=0
 pt4=0 u41=0 stt=0
Incident 190123 created, dump file: /u01/app/oracle/diag/rdbms/anbob/anbob1/incident/incdir_190123/anbob1_pmon_16253104_i190123.trc
ORA-00600: internal error code, arguments: [17147], [0x700000E82B30000], [], [], [], [], [], [], [], [], [], []

PMON: fatal error while deleting s.o. 700001011075e88 in this tree:
----------------------------------------
SO: 0x700000fd0ac9dc8, type: 2, owner: 0x0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
 proc=0x700000fd0ac9dc8, name=process, file=ksu.h LINE:12616 ID:, pg=0
(process) Oracle pid:228, ser:25, calls cur/top: 0x700000f0aca7218/0x700000f0ac85808
          flags : (0x1) DEAD
          flags2: (0x8000),  flags3: (0x10) 
          intr error: 0, call error: 0, sess error: 0, txn error 0
          intr queue: empty
  ksudlp FALSE at location: 0
  Cleanup details:
    Found dead = 5 sec ago
    Total Cleanup attempts = 1, Cleanup time = 0 sec, Cleanup timer = 0.000 sec
    Last attempt (full) started 5 sec ago, Length = in progress
  (post info) last post received: 2147483647 0 2
              last post received-location: ksl2.h LINE:2362 ID:kslpsr
              last process to post me: 700000fe0aa3210 1 6
              last post sent: 0 0 26
              last post sent-location: ksa2.h LINE:285 ID:ksasnd
              last process posted by me: 700000fb8ab0d80 1 6
  (latch info) wait_event=0 bits=0
  Process Group: DEFAULT, pseudo proc: 0x700000fc0b81778
  O/S info: user: grid, term: UNKNOWN, ospid: 21759124 (DEAD)
  OSD pid info: Unix process pid: 21759124, image: oracle@4adata1
  ----------------------------------------
  SO: 0x700001020e39a58, type: 4, owner: 0x700000fd0ac9dc8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x700000fd0ac9dc8, name=session, file=ksu.h LINE:12624 ID:, pg=0
  (session) sid: 1300 ser: 54478 trans: 0x0, creator: 0x700000fd0ac9dc8
            flags: (0x100045) USR/- flags_idl: (0x21) BSY/-/-/DEL/-/-
            flags2: (0x40008) -/-/-
            DID: , short-term DID: 
            txn branch: 0x0
            oct: 47, prv: 0, sql: 0x70000100eea0d10, psql: 0x70000100b3773b8, user: 118/HBYDRMS
  ksuxds FALSE at location: 0
  service name: SYS$USERS
  client details:
    O/S info: user: Administrator, term: unknown, ospid: 1234
    machine: WIN-1BBB4BL8USU program: JDBC Thin Client
  Current Wait Stack:
    Not in wait; last wait ended 167 min 17 sec ago 
  Wait State:
    fixed_waits=0 flags=0x21 boundary=0x0/-1
  Session Wait History:
      elapsed time of 167 min 17 sec since last wait
   0: waited for 'latch: shared pool'
      address=0x700000000109838, number=0x133, tries=0x0
      wait_id=2824 seq_num=2825 snap_id=1
      wait times: snap=0.000022 sec, exc=0.000022 sec, total=0.000022 sec
      wait times: max=infinite
      wait counts: calls=0 os=0
      occurred after 129 min 40 sec of elapsed time
   1: waited for 'latch: shared pool'
      address=0x700000000109838, number=0x133, tries=0x0
      wait_id=2823 seq_num=2824 snap_id=1
      wait times: snap=23.825311 sec, exc=23.825311 sec, total=23.825311 sec
      wait times: max=infinite
      wait counts: calls=0 os=0
      occurred after 0.015511 sec of elapsed time
	  ...
	  ...
	  ...
ORA-00600: internal error code, arguments: [17147], [0x700000E82B30000], [], [], [], [], [], [], [], [], [], []
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+360<-kjzdssdmp()+240<-kjzduptcctx()+228<-kjzdicrshnfy()+100<-ksuitm()+5124<-ksbrdp()+4508<-opirip()+1624<-opidrv()+608<-sou2o()+136<-opimai_real()+188<-ssthrdmain()+268<-main()+204<-__start()+112 
----- End of Abridged Call Stack Trace -----

*** 2015-09-16 23:58:12.650
PMON (ospid: 16253104): terminating the instance due to error 472
ksuitm: waiting up to [5] seconds before killing DIAG(16843208)

# anbob1_ora_21759124_i191931.trc

Dump file /u01/app/oracle/diag/rdbms/anbob/anbob1/incident/incdir_191931/anbob1_ora_21759124_i191931.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_1
System name: AIX
Node name: 4adata1
Release: 1
Version: 6
Machine: 00F8D36E4C00
Instance name: anbob1
Redo thread mounted by this instance: 1
Oracle process number: 228
Unix process pid: 21759124, image: oracle@4adata1


Dump continued from file: /u01/app/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_ora_21759124.trc
ORA-07445: 出现异常错误: 核心转储 [opiaba()+788] [SIGSEGV] [ADDR:0xF00000E161B7CD2] [PC:0x10662AFD4] [Address not mapped to object] []


*** 2015-09-16 23:44:23.229
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=by0ga2x4sm7ps) -----
begin execute immediate 'TRUNCATE TABLE sm_organization'; insert into sm_organization(groupcode,groupname,parentgroupcode) values(:1 ,:2 ,:3 );
.............
; insert into sm_organization(groupcode,groupname,parentgroupcode) values(:412987 ,:412988 ,:412989 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:412990 ,:412991 ,:412992 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:412993 ,:412994 ,:412995 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:412996 ,:412997 ,:412998 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:412999 ,:413000 ,:413001 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413002 ,:413003 ,:413004 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413005 ,:413006 ,:413007 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413008 ,:413009 ,:413010 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413011 ,:413012 ,:413013 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413014 ,:413015 ,:413016 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413017 ,:413018 ,:413019 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413020 ,:413021 ,:413022 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413023 ,:413024 ,:413025 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413026 ,:413027 ,:413028 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413029 ,:413030 ,:413031 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413032 ,:413033 ,:413034 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413035 ,:413036 ,:413037 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413038 ,:413039 ,:413040 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413041 ,:413042 ,:413043 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413044 ,:413045 ,:413046 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413047 ,:413048 ,:413049 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413050 ,:413051 ,:413052 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413053 ,:413054 ,:413055 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413056 ,:413057 ,:413058 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413059 ,:413060 ,:413061 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413062 ,:413063 ,:413064 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413065 ,:413066 ,:413067 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413068 ,:413069 ,:413070 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413071 ,:413072 ,:413073 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413074 ,:413075 ,:413076 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413077 ,:413078 ,:413079 ); insert into sm_organization(groupcode,groupname,parentgroupcode) values(:413080 ,:413081 ,:413082 ); end; 

----- Call Stack Trace -----
skdstdst <- ksedst1 <- ksedst <- dbkedDefDump <- ksedmp
<- ssexhd <- 48bc <- opiprs <- kksParseChildCursor <- rpiswu2
<- kksLoadChild <- kxsGetRuntimeLock <- kksfbc <- kkspsc0 <- kksParseCursor
<- opiosq0 <- kpooprx <- kpoal8 <- opiodr <- ttcpip
<- opitsk <- opiino <- opiodr <- opidrv <- sou2o
<- opimai_real <- ssthrdmain <- main <- start 

Note:
触发ora-7445的PL/SQL是一条拼接的匿名块清空表后再多次insert的调用,绑定变量个数超过了41万, 在MOS中找到了篇note非常符合, instance crashed after ORA-7445 [opiaba] and ORA-600 [17147] (文档 ID 1466343.1),原因是当同一个sql/ plsql文本中使用的绑定变量个数大于65535时触发的bug 12578873, 导致数据库instance crash。
符合note中记录的在call stack 中含部分或全部下列函数调用:
opiaba opiprs rpiswu2 kksLoadChild kxsGetRuntimeLock kksfbc

Solution:
解决方法是可以打上补丁12578873, 该补丁只是避免类似SQL时宕机, 同时还是要建议修改SQL 使用少于65535的绑定变量个数, 本案例建议使用loop 修改, 另外注意大量的绑定变量使用在SQL解析时需要更久的DB TIME。

打赏

对不起,这篇文章暂时关闭评论。