首页 » ORACLE 9i-23c » Troubleshooting to connect DB failed with Ora-12541 due to adump trace Inode usage 100%

Troubleshooting to connect DB failed with Ora-12541 due to adump trace Inode usage 100%

刚搞完一个ora-12154 又来一个ora-12541  No listener同样是个连接时问题,前端反馈使用public ip连接数据库时提示Ora-12541, 检查监听上无public IP,但是监听进程正常,同样存在service和VIP。 这里简单记录这个问题。

环境11.2.0.4 RAC on linux,   PUBLIC ip  100, Vip 101

分析思路
1, 检查监听
2, 检查IP和网卡状态
3, 检查crs状态

检查监听

$lsnrctl status

LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 05-5月 -2022 14:26:06

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.4.0 - Production
Start Date                08-9月 -2018 05:38:08
Uptime                    341 days 3 hr. 52 min. 12 sec  
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /oracle/app/11.2.0.4/grid/network/admin/listener.ora
Listener Log File         /oracle/app/grid/diag/tnslsnr/anbob2/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.*.*.101)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM2", status READY, has 1 handler(s) for this service...
Service "anbob" has 1 instance(s).
  Instance "anbob2", status READY, has 1 handler(s) for this service...
The command completed successfully

检查网卡

$ ip addr
...
21: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
    link/ether 34:6a:c2:b9:4e:07 brd ff:ff:ff:ff:ff:ff
    inet 10.*.*.100/26 brd 10.*.*.127 scope global bond1 <<<<<
    inet 10.*.*.101/26 brd 10.*.*.127 scope global secondary bond1:1
    inet 10.*.*.120/26 brd 10.*.*.127 scope global secondary bond1:3

检查CRS

grid@anbob2:/home/grid>crsctl stat res -t
CRS-4535: 无法与集群就绪服务通信
CRS-4000: 命令 Status 失败, 或已完成但出现错误。


--检查crs进程,为offline的
grid@anbob2:/home/grid>crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       anbob2                 Started             
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       anbob2                                     
ora.crf
      1        ONLINE  ONLINE       anbob2                                     
ora.crsd
      1        ONLINE  OFFLINE                    <<<<<                               
ora.cssd
      1        ONLINE  ONLINE       anbob2                                     
ora.cssdmonitor
      1        ONLINE  ONLINE       anbob2                                     
ora.ctssd
      1        ONLINE  ONLINE       anbob2                 OBSERVER            
ora.diskmon
      1        OFFLINE OFFLINE                                                   
ora.evmd
      1        ONLINE  ONLINE       anbob2                                     
ora.gipcd
      1        ONLINE  ONLINE       anbob2                                     
ora.gpnpd
      1        ONLINE  ONLINE       anbob2                                     
ora.mdnsd
      1        ONLINE  ONLINE       anbob2         

Note:
CRSD资源已offline.

检查CRSd log

022-05-02 13:23:15.828: 
[ohasd(32935)]CRS-10000:CLSU-00100: Operating System function: mkdir failed with error data: 28
CLSU-00101: Operating System error message: No space left on device
CLSU-00103: error location: authprep6
CLSU-00104: additional error information: failed to make dir /oracle/app/11.2.0.4/grid/auth/ohasd/anbob2/A8421321

2022-05-02 13:23:29.018: 
[crsd(42907)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /oracle/app/11.2.0.4/grid/log/anbob2/crsd/crsd.log.
2022-05-02 13:23:29.047: 
[crsd(42907)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
ORA-09925: Unable to create audit trail file
Linux-x86_64 Error: 28: No space left on device
Additional information: 9925
]. Details at (:CRSD00111:) in /oracle/app/11.2.0.4/grid/log/anbob2/crsd/crsd.log.
2022-05-02 13:23:29.157: 
...
2022-05-02 13:25:20.344: 
[ohasd(32935)]CRS-2765:Resource 'ora.crsd' has failed on server 'anbob2'.
2022-05-02 13:25:20.344: 
[ohasd(32935)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.

Note:
因为OS 磁盘资源crash.

检查文件系统

$ df
$ df -i

Note:
显示ORACLE_BASE目录 文件系统的INODE 100%, 关于INODE介绍可以参考这里

Linux uses inodes (index nodes) to keep track of all files in the system whether it is images, videos, emails, spams, website content, backups. But every system has a limit on number of inodes allowed, depending on the system memory,Find Directory with Most Inode Usage

for i in /*; do echo $i; find $i |wc -l; done

为什么CRSD.BIN会影响PUBLIC ip
这个可以在测试环境还原问题,在kill crsd.bin后,listener上的public ip会立即自动消失, 当crsd.bin 自动启动后,public ip又会再次注册到listener上, 以上工作均有oraagent完成,无需任何操作,当 crsd 和 oraagent 进程自动重新启动时,应在 1 分钟左右注册丢失的 endpoint。 当然影响的不只是listener还有scan listener. 因为listener的动态ENDPOINTS是被oraagent进程完成,当crsd.bin挂掉时oragent也会自动挂掉,这时因为vip 的endpoint被db instance使用,所以vip保留,而public ip消失。

Listener dynamic endpoints are registered by the oraagent process. When crsd.bin dies, oraagent process will also die. Then the listener dynamic endpoints will not be available until oraagent process is restarted and registers those dynamic endpoints again. This is expected behavior.

However listener wil not drop the dynamic endpoint if the endpoint is in use by instance even after the oraagent process is terminated. This can be seen from the lsnrctl status output above, the host VIP endpoint remains after crsd.bin crashes.

Read more…

1, Start from 11.2 GRID Agent dynamically registers endpoints (VIP and Public IP) with the listener. Agent gets Public IP from /etc/hosts (if no DNS). If Agent fails to get Public IP, then the listener end point will not be created. or  incorrect permission of /etc/nsswitch.conf

2, The IP address was changed on this server. lsnrctl status shows that service name ‘ can not be registered to listener.

The content of init parameter local_listener was read from tnsnames.ora only when the database started.  It was stored in v$listener_network(X$KMMNV).

Changes in tnsnames.ora would not be reflected to v$listener_network automatically. So PMON/LREG still uses the old value in v$listener_network for dynamic service registration.To reflect the changes in tnsnames.ora, you need to set init parameter local_listener again with the same alias.

Running “lsnrctl RELOAD” against a Listener will only affect Dynamic database services, instances, service handlers, and listening endpoints.
Static ones, such as those in the ADDRESS section of the Listener.ora file are not changed.

 

打赏

对不起,这篇文章暂时关闭评论。