首页 » ORACLE [C]系列, ORACLE 9i-23c » Troubleshooting Oracle 19c RAC ORA-29770 with LMD hang, LMHB terminating the instance

Troubleshooting Oracle 19c RAC ORA-29770 with LMD hang, LMHB terminating the instance

前段时间一个oracle 19c RAC 1个节点异常重启,日志显示是lmd进程hang 丢失heartbaet 超过70s, Lmhb进程重启了实例, 操作系统资源空闲,从lmhb trace中确实lmd在做free memory的操作。

DB alert log

2023-02-22T15:43:36.739754+08:00
Thread 2 advanced to log sequence 6653 (LGWR switch), current SCN: 19860977446380
Current log# 18 seq# 6653 mem# 0: +DATA/anbob/ONLINELOG/group_18.958.1121127477
2023-02-22T15:43:37.381412+08:00
ARC1 (PID:382736): Archived Log entry 15101 added for T-2.S-6652 ID 0xa5aa2106 LAD:1
2023-02-22T15:53:13.704691+08:00
LMD1 (ospid: 382285) has not called a wait for 81 secs.
2023-02-22T15:53:17.140819+08:00
Errors in file /u01/app/oracle/diag/rdbms/anbob/anbob2/trace/anbob2_lmhb_382315.trc (incident=205480) (PDBNAME=CDB$ROOT):
ORA-29770: global enqueue process LMD1 (OSID 382285) is hung for more than 70 seconds
Incident details in: /u01/app/oracle/diag/rdbms/anbob/anbob2/incident/incdir_205480/anbob2_lmhb_382315_i205480.trc
2023-02-22T15:53:21.087390+08:00
LOCK_DBGRP: GCR_SYSTEST debug event locked group GR+DB_anbob by memno 1
LMHB (ospid: 382315): terminating the instance due to ORA error 29770
Cause - 'ERROR: Some process(s) is not making progress.
LMHB (ospid: 382315) is terminating the instance.
Please check LMHB trace file for more details.
Please also check the CPU load, I/O load and other system properties for anomalous behavior
ERROR: Some process('
2023-02-22T15:53:22.273416+08:00
ORA-1092 : opitsk aborting process
2023-02-22T15:53:23.638753+08:00
License high water mark = 4184
2023-02-22T15:53:24.546319+08:00
Dumping diagnostic data in directory=[cdmp_20230222155321], requested by (instance=2, osid=382315 (LMHB)), summary=[abnormal instance termination].
2023-02-22T15:53:27.517019+08:00
Instance terminated by LMHB, pid = 382315
2023-02-22T15:53:28.731204+08:00
Warning: 2 processes are still attacheded to shmid 1998852:
(size: 81920 bytes, creator pid: 381966, last attach/detach pid: 382133)
2023-02-22T15:53:29.639831+08:00
USER(prelim) (ospid: 260076): terminating the instance
2023-02-22T15:53:29.643403+08:00
Instance terminated by USER(prelim), pid = 260076
2023-02-22T15:53:32.764711+08:00

OS top

zzz ***Wed Feb 22 15:52:21 CST 2023
top  up 124 days, 23:12,  0 users,  load average: 4.11, 4.27, 4.46
Tasks: 3206 total,   2 running, 3204 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.9 us,  0.5 sy,  0.0 ni, 98.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 79094534+total, 24925899+free, 42708208+used, 11460425+buff/cache
KiB Swap: 33554428 total, 33554428 free,        0 used. 34671769+avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
382285 oracle    20   0  0.254t  46128  34092 R 100.0  0.0 814:58.30 ora_lmd1_+
382315 oracle    20   0  0.246t  31544  25232 S  59.4  0.0 157:18.50 ora_lmhb_+
257900 grid      20   0  112532   5364    692 S  11.3  0.0   0:00.23 pidstat
257904 grid      20   0  112532   5364    692 S  11.3  0.0   0:00.23 pidstat
394798 oracle    20   0  0.254t  57216  47648 S   6.6  0.0   5:23.63 oracle_39+
257929 grid      20   0  160804   5428   1536 R   5.7  0.0   0:00.14 top

lmhb trace

voluntary_ctxt_switches:        1146306377
nonvoluntary_ctxt_switches:        680777
Short stack dump: 
voluntary_ctxt_switches: 1146306377
nonvoluntary_ctxt_switches: 680777
Short stack dump: 
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+223<-__sighandler()<-kjr_freeable_chunk_free()+2925
<-kjrchc()+9283<-kjmdmain_helper()+6258<-kjmdm()+74<-ksbrdp()+1167<-opirip()+541
<-opidrv()+581<-sou2o()+165<-opimai_real()+173<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245

Frits Hoogland  ‘s

ksdxcb()+872 kernel service debug internal errors ksdx callback for sosd layer signal handler
sspuser()+223 operating system dependent system process management handle SIGUSR2 for Oracle
__sighandler() (?) [partial hit for: ]
kjr_freeable_chunk_free()+2925 Kernel lock management Resource table [partial hit for: kjr ] free   memory
kjrchc()+9283 kernel lock management resource table [partial hit for: kjr ]
kjmdmain_helper()+6258 kernel lock management RAC multiple LMS [partial hit for: kjm ]
kjmdm()+74 kernel lock management RAC multiple LMS [partial hit for: kjm ]
ksbrdp()+1167 kernel service background processes run a detached background process

通过错误与CALL stack 匹配 Bug 32076305  ORA-29770 LMD has no heartbeats – LMD Stack is in kjr_freeable_chunk_free

解决方案
安装oneoff path,如果存在
— or —
升级到19.14 RU及以后

打赏

,

对不起,这篇文章暂时关闭评论。