首页 » ORACLE 9i-23ai » Troubleshooting Oracle 19c RAC DB crash after ora-600 [kjblpgorm:!antilock] and start fail with Ora-600 [kfmdPriRegRclient04]

Troubleshooting Oracle 19c RAC DB crash after ora-600 [kjblpgorm:!antilock] and start fail with Ora-600 [kfmdPriRegRclient04]

最近有个客户的oracle 19c 3nodes RAC 有一个节点意外crash ORA-600 kjblpgorm:!antilock, 启动时报ORA-600[kfmdPriRegRclient04],并启动过程中重导致之前的幸存节点hang并且重启,Oracle 的基础版本bug 比较多,找我分析并临时解决了该问题,简单记录该问题。

— version 19.3

ORA-00600: internal error code, arguments: [kjblpgorm:!antilock]

db alert log

2025-06-06T13:05:24.742963+08:00
Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob3/trace/anbob3_lms3_3782790_3782800.trc  (incident=656189):
ORA-00600: internal error code, arguments: [kjblpgorm:!antilock], [680864], [115], [0], [11], [1269857], [5], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/anbob/anbob3/incident/incdir_656189/anbob3_lms3_3782790_3782800_i656189.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2025-06-06T13:05:26.709825+08:00
Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob3/trace/anbob3_lms3_3782790_3782800.trc:
ORA-00600: internal error code, arguments: [kjblpgorm:!antilock], [680864], [115], [0], [11], [1269857], [5], [], [], [], [], []
2025-06-06T13:05:26.848797+08:00
Dumping diagnostic data in directory=[cdmp_20250606130526], requested by (instance=3, osid=3782800 (LMS3)), summary=[incident=656189].
2025-06-06T13:05:28.612222+08:00
opidrv aborting process LMS3 ospid (3782790_3782800) as a result of ORA-600
2025-06-06T13:05:28.653226+08:00
PMON (ospid: ): terminating the instance due to ORA error 

Note:
后台进程LMS因ora-600 错误,实例crash.

Format: ORA-600 [kjblpgorm:!antilock] [a] [b] [c] [d] [e]
a = id1
b = id2
c = pkey pdb
d = tablespace number
e = object #

kjblpgorm antilock  (kjbl)pgorm antilock – kernel lock management global cache service lock table ?

NB Prob Bug Fixed Description
II 29531836 19.6, 20.1 RAC INSTANCE PRODUCES ORA-00600 [kjblpgorm:!antilock]
III 36354638 19.25, 23.4 LMS Hit ORA-00600: internal error code, arguments: [kjblpgorm:!antilock]
IIII 35843249 19.22, 23.4 [RAC] LMS Hit ORA-600[kjblpgorm:!antilock]
III 35151872 19.22 HA Mode: Hit ORA-600 [kjblpgorm:!antilock]
III 32783456 19.17 ORA-600[kjblpgorm:!antilock] Instance Crash
III 29646315 19.20, 20.1 ASM, DB LMS HIT ORA-600[KJBLPGORM:!ANTILOCK]
IIII 29464779 12.1.0.2.200714, 12.2.0.1.DBRU:200114, 18.18, 18.8, 19.4, 20.1 LMS: ORA-600 [kjblpgorm:!antilock] crashing the instance, ORA-600 [3020], ORA-752 during media recover
III 29372069 12.2.0.1.DBRU:200114, 18.11, 18.18, 19.8, 20.1 Instance Crash With ORA-600[kjblpgorm:!antilock]
III 29038730 19.12, 20.1 Hitting the ORA-600[kjblpgorm:!antilock] followed by instance crash
IIII 35045932 19.21 [RAC] Instance crash after ORA-600 [kjblpgorm:!antilock]

该问题相关的bug 较多,像 Bug 35045932 – Instance crash after ORA-600 [kjblpgorm:!antilock]
Bug 29464779 – LNX64-20.1-ASM,DB LMS HIT ORA-600[KJBLPGORM:!ANTILOCK] THEN CRASH
Bug 35843249  [RAC] LMS Hit ORA-600[kjblpgorm:!antilock]
都是因为使用了DRM 在11g版本引入的read-mostly 新特性引起的。

read-mostly is enabled(Default Enabled)The pkey check is missing from when the anti-lock is not an LE. This can cause wrong anti-lock being closed after object reused.

什么Read-mostly locking

DRM(Dynamic Resource Remastering)在10gR2引入Affinity Locks和Object级别的DRM,11g引入Read-Mostly和Reader Bypass.而了read mostly locking的机制,它会基于对象的global operation历史。用于减少读访问的消息传递和CPU消耗。oracle的cache层记录着每个对象上的S lock和X lock的数量,如果某个节点打开了大量的S lock并且很少了的X lock,并且block传输的比较少,那么这个对象在这个节点上就是read mostly了。当read mostly发生的时候,对象的共享就停止了,并且block不再通过interconnect进行传输(除非block被修改)。

当一个对象被定义成read mostly,他会被master node授予在所有节点上的S affinity lock,这意味着所有的节点都被“提前”授予了该block的读访问权限,因此,减少了在各个节点间互相传递S lock的消息量。

Oracle使用一种特殊的叫anti-lock,来控制read mostly对象上的X锁。当x lock被申请时,所有的节点会被广播通知到要打开anti-lock,所有的对那个块的访问(不管是S lock还是X lock)都会变成标准的cache fusion locking,即使该对象本身还是read-mostly。广播会在分配X lock之前完成,仅当block上没有anti-lock打开的时候。anti-lock将会在read-mostly消失的时候,或者脏块写入磁盘的时候清除掉,并且X lock会降级。

为read-mostly的对象打开x lock是非常昂贵的操作,在分配x lock之前,master node需要广播anti-lock给所有的节点。在x lock关闭之前,anti-lock不能被移除。另外,在节点加入集群的时候,他也会创建anti-lock,anti-lock只是在LE上标记KCLL_F_ANTI,并且在有anti-lock的情况下,read-mostly lock不能被分配。

more 【深入解析】DRM和read-mostly locking

解决方法

除了升级,就是禁用GCS read-mostly locking或干脆禁用所有DRM。

-- Dynamic workaround

 alter system set "_lm_drm_disable"=4;
 oradebug setmypid
 oradebug lkdebug -m reconfig disrm

-- Static workaround

 alter system set "_gc_read_mostly_locking"=false scope=spfile sid='*' ;
 alter system set "_gc_persistent_read_mostly"=false scope=spfile sid='*' ;

ORA-00600: internal error code, arguments: [kfmdPriRegRclient04]

db alert log

Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob3/trace/anbob3_fenc_758375.trc:
ORA-00600: internal error code, arguments: [kfmdPriRegRclient04], [], [], [], [], [], [], [], [], [], [], []
Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob3/trace/anbob3_fenc_758375.trc (incident=853023):
ORA-854 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: /oracle/app/oracle/diag/rdbms/anbob/anbob3/incident/incdir_853023/anbob3_fenc_758375_i853023.trc
2025-06-06T21:23:00.975935+08:00
Dumping diagnostic data in directory=[cdmp_20250606212300], requested by (instance=3, osid=758375 (FENC)), summary=[incident=853022].
2025-06-06T21:23:02.073135+08:00
USER (ospid: ): terminating the instance due to ORA error

fenc trace file

fenc 后台进程用于当db crash时,css监控到后使用该进程隔离db 到ASM 层的IO请求。

                                                   000000000 ? 000000082 ?
kgerinv_internal()+  call     kgeadse()            7F39A6A829A0 ? 7F39A6960048
44                                                 000000258 ? 012F67394
                                                   7F3900000000 7FFC00000000
kgerinv()+40         call     kgerinv_internal()   7F39A6A829A0 ? 7F39A6960048 ?
                                                   000000258 ? 012F67394 ?
                                                   7F3900000000 ? 7FFC00000000 ?
kgeasnmierr()+146    call     kgerinv()            7F39A6A829A0 ? 7F39A6960048 ?
                                                   000000258 ? 012F67394 ?
                                                   7F3900000000 ? 7FFC00000000 ?
kfmdPriRegRclient()  call     kgeasnmierr()        7F39A6A829A0 ? 7F39A6960048 ?
+1421                                              000000258 ? 012F67394 ?
                                                   14019C5608 000000000
kfmdProcessRclient(  call     kfmdPriRegRclient()  7F39A6A829A0 ? 7F39A6960048 ?
)+276                                              000000258 ? 6264637A78
                                                   14019C5608 ? 000000000 ?
kfnbListenNodeRecon  call     kfmdProcessRclient(  7F39A6A829A0 ? 000000004
f()+1224                      )                    000000258 ? 6264637A78 ?
                                                   14019C5608 ? 000000000 ?

 

kfmdPriRegRclient04 (kfmd)PriRegRclient04 – kernel automatic storage management node monitor interface implementation layer for diskgroup registration

 相关的bug

Bug 32656231  @ Slow Instance Startup and ORA-00600 [kfmdPriRegRclient04] on FENC Process
Bug 35469192  Instance Crashes With ORA-600 [kfmdpriregrclient04] During Reconfiguration

During instance start, it takes too long to complete FIXWRITE step and instance is
 killed and restarted when using Real Application Clusters (RAC)

 

  • Stack is likely to include kgeasnmierr
  • Stack is likely to include kfmdPriRegRclient
  • Stack is likely to include kfmdProcessRclient
  • Stack is likely to include kfnbListenNodeRecon
  • Stack is likely to include ksbrdp

较匹配,解决方法升级RU。

 

— OVER —

打赏

, ,

目前这篇文章还没有评论(Rss)

我要评论