首页 » ORACLE » Troubleshooting ora-12519 or ora-12516 ,listener service ‘blocked’, and wait event ‘latch: ges resource hash

Troubleshooting ora-12519 or ora-12516 ,listener service ‘blocked’, and wait event ‘latch: ges resource hash

[oracle@kdexa1db01 (orarpt1)~]$ tnsping anbob2
TNS Ping Utility for Linux: Version 11.2.0.3.0 - Production on 05-JUN-2015 10:39:14
Copyright (c) 1997, 2011, Oracle.  All rights reserved.
Used parameter files:

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 133.96.60.16)(PORT = 1521)) (CONNECT_DATA = (SID = anbobc2)))
OK (10 msec)
[oracle@kdexa1db01 (orarpt1)~]$ sqlplus weejar@anbob2

SQL*Plus: Release 11.2.0.3.0 Production on Fri Jun 5 10:40:36 2015

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Enter password: 
ERROR:
ORA-12519: TNS:no appropriate service handler found

TIP:
通常当出现ora-12519 or 12516时都是因为数据库进程数超过了数据库参数processes 或 sessions 时, 并且通常在db alert 中出现ora-20 or ora-18 的错误信息,如果当时查看监听服务状态使用lsnrctl service 会发现service 当时是”blocked”状态, listener 进程是有pmon 进程在不断的获得DB 的最新负载及连接数和db 的processes parameter, pmon 的更新前不是每秒,通常在5s-10分钟之间,负载变化大时会自己加快更新频率。 listener自己知道会话增加的数量,但是当会话断开后listener不会立即知道, 也就是当listener 上的service 在”blocked”状态后,知道下次pmon 进程通知现在当前连接小于processes parameter 值时(当然sessions 也是限制,包括SQL解析前系统的递归会话),新的连接请求才能创建成功。 今天遇到了一个案例,这里分享一下我处理思路。 希望对你有用。

先观察问题当时的数据库资源使用情况

     
SQL> select * from v$resource_limit;

RESOURCE_NAME                  CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_AL LIMIT_VALU
------------------------------ ------------------- --------------- ---------- ----------
processes                                     1057            1247       4000       4000  <<<
sessions                                      1093            1249       4405       4405  <<<
enqueue_locks                                21693           53190      53190      53190
enqueue_resources                              780            1039      20424  UNLIMITED
ges_procs                                     1056            1245       4001       4001
ges_ress                                         0               0      82503  UNLIMITED
ges_locks                                        0               0     128193  UNLIMITED
ges_cache_ress                                7417           95371          0  UNLIMITED
ges_reg_msgs                                  1223            2996       8750  UNLIMITED
ges_big_msgs                                   245             442       1934  UNLIMITED
ges_rsv_msgs                                     0               0       1000       1000
gcs_resources                              1400382         2253096    2421479    2421479
gcs_shadows                                 726267         1520777    2421479    2421479
dml_locks                                      126            2469      19380  UNLIMITED
temporary_table_locks                            8             352  UNLIMITED  UNLIMITED
transactions                                    16            2472       4845  UNLIMITED
branches                                        37             136       4845  UNLIMITED
cmtcallbk                                        0               3       4845  UNLIMITED
sort_segment_locks                             106             125  UNLIMITED  UNLIMITED
max_rollback_segments                          170             185       4845      65535
max_shared_servers                               0               0  UNLIMITED  UNLIMITED
parallel_max_servers                            26              42         40       3600

TIP:
当时的数据库进程和会话数实际并未达到processes parameter 的50%,使用count v$session也可以确认。那我们可以在DB本机绕过LISTENER,尝试创建一个新的连接是否成功?

[oracle:/home/oracle]# sqlplus weejar/weejar_dba123

SQL*Plus: Release 10.2.0.5.0 - Production on Fri Jun 5 10:45:01 2015

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

[oracle:/home/oracle]# sqlplus weejar/weejar_dba123@133.96.60.16:1521/anbobc.com

SQL*Plus: Release 10.2.0.5.0 - Production on Fri Jun 5 10:45:20 2015

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

ERROR:
ORA-12516: TNS:listener could not find available handler with matching protocol
stack

NOTE:
通过上面尝试可以确认,数据库实际并发达到最大连接数限制, 连接时ora-125xx错误应该是 listener上的限制,下面查看监听状态。

[oracle:/home/oracle]# ps -ef|grep lsnr
    grid  8519706        1   0   May 15      - 17:08 /oracle/app/11.2.0.3/grid/bin/tnslsnr LISTENER -inherit
  oracle 26411174 34931032   0 10:45:30  pts/2  0:00 grep lsnr
    grid 12058930        1   0   May 15      -  0:32 /oracle/app/11.2.0.3/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
    
[oracle:/home/oracle]#  /oracle/app/11.2.0.3/grid/bin/lsnrctl status

LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 05-JUN-2015 10:45:45

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
Start Date                15-MAY-2015 02:26:10
Uptime                    21 days 8 hr. 19 min. 36 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      ON
Listener Parameter File   /oracle/app/11.2.0.3/grid/network/admin/listener.ora
Listener Log File         /oracle/app/grid/diag/tnslsnr/anbob2/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.16)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.116)(PORT=1521)))
Services Summary...
Service "anbobc.com" has 1 instance(s).
  Instance "anbobc2", status READY, has 1 handler(s) for this service...
Service "anbobc_XPT.com" has 1 instance(s).
  Instance "anbobc2", status READY, has 1 handler(s) for this service...
The command completed successfully

[oracle:/home/oracle]# /oracle/app/11.2.0.3/grid/bin/lsnrctl  service

LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 05-JUN-2015 10:45:55

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
Services Summary...
Service "anbobc.com" has 1 instance(s).
  Instance "anbobc2", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:190741 refused:0 state:blocked    <<<<<<<<
         LOCAL SERVER
Service "anbobc_XPT.com" has 1 instance(s).
  Instance "anbobc2", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:190741 refused:0 state:blocked
         LOCAL SERVER
The command completed successfully

TIP:
这个库是10.2.0.5 的DB, 使用的是11.2.0.3的GI, 监听在grid 的用户下。 当时的服务在监听上,只是service 的状态已经是”blocked”, 当然有可能是主机资源问题导致无法创建新连接,在早前的版本中还有可能是listener.log文件的过大,这两个原因都已排除。

+-topas_nmon--l=LongTerm-CPU-----Host=anbob2---------Refresh=2 secs---10:51.45---------------
| Memory ------------------------------------------------------------------------------------
|          Physical  PageSpace |        pages/sec  In     Out | FileSystemCache              
|% Used       58.3%      0.3%  | to Paging Space   0.0    0.0 | (numperm)  9.7%              
|% Free       41.7%     99.7%  | to File System    0.0    1.5 | Process   42.9%              
|GB Used      70.0GB     0.2GB | Page Scans        0.0        | System     5.7%              
|GB Free      50.0GB    63.8GB | Page Cycles       0.0        | Free      41.7%              
|Total(GB)   120.0GB    64.0GB | Page Steals       0.0        |           ------             
|                              | Page Faults    8280.1        | Total    100.0%              
|------------------------------------------------------------ | numclient  9.7%              
|Min/Maxperm     3574MB(  3%)  23827MB( 20%) <--% of RAM      | maxclient 10.0%              
|Min/Maxfree     960   1088       Total Virtual  184.0GB      | User      49.6%              
|Min/Maxpgahead    2      8    Accessed Virtual   57.9GB 31.4%| Pinned     8.6%              
| 20.0                                                       | lruable pages   304991

下面查看CRS资源状态。

anbob2:/home/grid> crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       anbob1                                       
               ONLINE  ONLINE       anbob2                                                                           
ora.asm
               OFFLINE OFFLINE      anbob1                                       
               OFFLINE OFFLINE      anbob2                                       
ora.gsd
               OFFLINE OFFLINE      anbob1                                       
               OFFLINE OFFLINE      anbob2                                       
ora.net1.network
               ONLINE  ONLINE       anbob1                                       
               ONLINE  ONLINE       anbob2                                       
ora.ons
               ONLINE  ONLINE       anbob1                                       
               ONLINE  ONLINE       anbob2                                       
ora.registry.acfs
               ONLINE  OFFLINE      anbob1                                       
               ONLINE  OFFLINE      anbob2                                       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       anbob2                                       
ora.cvu
      1        ONLINE  ONLINE       anbob2                                       
ora.anbob1.vip
      1        ONLINE  ONLINE       anbob1                                       
ora.anbob2.vip
      1        ONLINE  ONLINE       anbob2                                       
ora.oc4j
      1        ONLINE  ONLINE       anbob2                                       
ora.scan1.vip
      1        ONLINE  ONLINE       anbob2                 

TIP:
现在时间紧迫,前先需要恢复连接, 此时的节点1还好是正常,新的连接被转移到了节点2, 现在可以猜测有可能是listener.ora 上的连接数信息不正确, 尝试重启listener

 
su - grid
srvctl stop listener -n anbob2
srvctl start listener -n anbob2

su - oracle
alter system register;

NOTE:
再次查看listener的状态,发现服务没有注册上,数据库登录尝试手动注册也未成功,包括使用tnsnames.ora + local_listener的方式尝试也未成功(default port:1521 not need),动态注册不成功,我们放弃这种方式,使用静态注册,注册11g的监听和以前版本的区别,这里我使用了netmgr图形化操作,操作细节未记录。静态注册成功后,再次重启监听,下面再尝试连接。

[oracle@anbob2:/home/oracle]#/oracle/app/11.2.0.3/grid/bin/lsnrctl status

LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 05-JUN-2015 17:21:10

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
Start Date                05-JUN-2015 12:07:11
Uptime                    0 days 5 hr. 13 min. 59 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      ON
Listener Parameter File   /oracle/app/11.2.0.3/grid/network/admin/listener.ora
Listener Log File         /oracle/app/grid/diag/tnslsnr/anbob2/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.16)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.116)(PORT=1521)))
Services Summary...
Service "anbobc.com" has 2 instance(s).
  Instance "anbobc2", status UNKNOWN, has 1 handler(s) for this service...

# sqlplus weejar/weejar_dba123@133.96.60.16:1521/anbobc.com

SQL*Plus: Release 10.2.0.5.0 - Production on Fri Jun 5 13:28:42 2015

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> 

NOTE:
注意到这里新的连接已经成功。松了口气, 下面开始分析当时的原因,为什么监听连接数会不正确?为什么动态注册service不能注册到Listener上?这些从我任务从一开始我们就有说过是有pmon完成的。下面排查是不是pmon Background process 出了问题。

# DB alert no ora-18 and ora-20

Fri Jun 05 10:45:46 BEIST 2015
PMON failed to acquire latch, see PMON dump
Fri Jun 05 10:54:38 BEIST 2015

# pmon trace
*** 2015-06-05 10:45:46.237
PMON unable to acquire latch  700000ad1a9ee98 Child ges resource hash list level=1 child#=1254
        Location from where latch is held: kjrmas1: lookup master node:
        Context saved from call: 504403196568791072
        state=busy, wlstate=free
    waiters [orapid (seconds since: put on list, posted, alive check)]:
     1064 (0, 1433476592, 0)
     1051 (0, 1433476592, 0)
     1058 (0, 1433476592, 0)
     1048 (0, 1433476592, 0)
     1050 (0, 1433476592, 0)
     1055 (0, 1433476592, 0)
     1031 (0, 1433476592, 0)
     waiter count=7
    gotten 20376414 times wait, failed first 10108322 sleeps 15878606   <<<<<<<<<<<<<<<<<<
    gotten 11598 times nowait, failed: 59605
    recovery area:
Dump of memory from 0x0700000AD021B280 to 0x0700000AD021B280
  possible holder pid = 1066 ospid=46072128                 <<<<<<<<<<<<<<<<<<<<<
----------------------------------------
SO: 700000ad571b5e8, type: 2, owner: 0, flag: INIT/-/-/0x00
  (process) Oracle pid=1066, calls cur/top: 700000a7756c988/700000a333d74a8, flag: (0) -
            int error: 0, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 0 0 55
              last post received-location: kjata: wake up enqueue owner
              last process to post me: 700000ad86f6730 1 6
              last post sent: 0 0 108
              last post sent-location: kjmpost: post lmd
              last process posted by me: 700000ad86f6730 1 6
  (latch info) wait_event=0 bits=2
    holding    (efd=16) 700000ad1a9ee98 Child ges resource hash list level=1 child#=1254
        Location from where latch is held: kjrmas1: lookup master node:
        Context saved from call: 504403196568791072
        state=busy, wlstate=free
        waiters [orapid (seconds since: put on list, posted, alive check)]:
         1064 (0, 1433476592, 0)
         1051 (0, 1433476592, 0)
         1058 (0, 1433476592, 0)
         1048 (0, 1433476592, 0)
         1050 (0, 1433476592, 0)
         1055 (0, 1433476592, 0)
         1031 (0, 1433476592, 0)
         waiter count=7
    Process Group: DEFAULT, pseudo proc: 700000ad47e8d68
    O/S info: user: oracle, term: UNKNOWN, ospid: 46072128
    OSD pid info: Unix process pid: 46072128, image: oracle@anbob2 (P015)
Short stack dump:
ksdxfstk+002c<-ksdxcb+0500<-sspuser+0074<-000048BC<-kjskchcv+00f0<-kjuocl+0750<-kjusuc+05d0<-ksipgetctx+04fc<-ksqcmi+1b0c<-ksqgtlctx+1214<-ksqgelctx+0358

TIP:
可以看到问题时段,果然pmon进程有些busy, 取不到latch ges resource hash list, blocker是spid=46072128,pid = 1066进程

[oracle@anbob2:/home/oracle]#ps -ef|grep 46072128
  oracle 19071202 45220332   0 15:07:58  pts/1  0:00 grep 46072128
  oracle 46072128        1  10   Jun 03      - 311:14 ora_p015_anbobc2
  
SQL> @uopid 1066

USERNAME                SID                 AUDSID OSUSER           MACHINE            PROGRAM              
       PID SPID         SQL_HASH_VALUE STATUS
----------------------- -------------- ----------- ---------------- ------------------ --------------------
 ---------- ------------ -------------- --------
REPORT                   '3609,36916'    262269818 report           kybb1              (P015)               
      1066 46072128         1746131492 ACTIVE

SQL> @usid 3609

USERNAME                SID                 AUDSID OSUSER           MACHINE            PROGRAM              SPID           
  OPID CPID         SQL_ID         HASH_VALUE   LASTCALL STATUS   SADDR            PADDR            TADDR            LOGON_TIME
----------------------- -------------- ----------- ---------------- ------------------ -------------------- -------------- 
------ ------------ ------------- ----------- ---------- -------- ---------------- ---------------- ---------------- -----------------
REPORT                   '3609,36916'    262269818 report           kybb1              (P015)               46072128      
   1066 46072128     8av2z2xn17qj4  1746131492     218808 ACTIVE   0700000AD4AA4250 0700000AD571B5E8 0700000A99E44CB8 20150603 02:24:40

SQL> @xi 8av2z2xn17qj4 %
eXplain the execution plan for sqlid 8av2z2xn17qj4 child %...

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
SQL_ID  8av2z2xn17qj4, child number 0
-------------------------------------
       create table xxxx_315 nologging as    select /*+use_hash(a b) parallel(a 8)
no_index(a)*/ 20150602 cycle,315 region,subsid usernum,acctid,nvl(productgroup,'PrdGrpGlobal')
productgroup,nvl(productid,'gl.g.nml') productid   xxxxx --省略

Plan hash value: 2866634249

------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name                | E-Rows | Pstart| Pstop |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT   |                     |        |       |       |       |       |          |
|   1 |  PX COORDINATOR          |                     |        |       |       |       |       |          |
|   2 |   PX SEND QC (RANDOM)    | :TQ10002            |    154M|       |       |       |       |          |
|   3 |    LOAD AS SELECT        |                     |        |       |       |   256K|   256K|          |
|*  4 |     HASH JOIN OUTER      |                     |    154M|       |       |  2047M|   113M|          |
|   5 |      PX RECEIVE          |                     |    154M|       |       |       |       |          |
|   6 |       PX SEND HASH       | :TQ10000            |    154M|       |       |       |       |          |
|   7 |        PX BLOCK ITERATOR |                     |    154M|     6 |     6 |       |       |          |
|*  8 |         TABLE ACCESS FULL| xxxx_315            |    154M|     6 |     6 |       |       |          |
|   9 |      PX RECEIVE          |                     |      1 |       |       |       |       |          |
|  10 |       PX SEND HASH       | :TQ10001            |      1 |       |       |       |       |          |
|  11 |        PX BLOCK ITERATOR |                     |      1 |    34 |    34 |       |       |          |
|* 12 |         TABLE ACCESS FULL| SUBSCRIBER          |      1 |    34 |    34 |       |       |          |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("XXXX"."REGION"="SUBSCRIBER"."REGION" AND
              "XXXX"."SUBSID"="SUBSCRIBER"."OID")
   8 - access(:Z>=:Z AND :Z<=:Z)        filter(("REGION"=315 AND "WRTOFFSN"=0))   12 - access(:Z>=:Z AND :Z<=:Z)        filter(("CYCLE"=201505 AND "REGION"=315))            SQL> @px
Show current Parallel Execution sessions in RAC cluster...

QC_SID         QCINST_ID USERNAME   SQL_ID                 DEGREE REQ_DEGREE     SLAVES   INST_CNT   MIN_INST   MAX_INST
------------- ---------- ---------- ------------------ ---------- ---------- ---------- ---------- ---------- ----------
3864,47259             2 REPORT     8av2z2xn17qj4               8          8         16          1          2          2
3864,47259             2 REPORT     19gg34w9r4r5m               8          8          8          1          2          2
3560,39391             2 SYS        6643dd2jtv7jh               2          2          2          2          1          2

SQL> @usid 3864

USERNAME                SID                 AUDSID OSUSER           MACHINE            PROGRAM              SPID             OPID CPID       
  SQL_ID         HASH_VALUE   LASTCALL STATUS   SADDR            PADDR            TADDR            LOGON_TIME
----------------------- -------------- ----------- ---------------- ------------------ -------------------- -------------- ------ ------------ 
------------- ----------- ---------- -------- ---------------- ---------------- ---------------- -----------------
REPORT                   '3864,47259'    262269818 report           kybb1              (TNS V1-V3)          27721980          544 27156        
8av2z2xn17qj4  1746131492     220783 ACTIVE   0700000AD8AFDB78 0700000AD271DA38 0700000A9E8B74D8 20150603 02:22:13

查看当前的活动会话(FOREGROUND)

USERNAME         USID MACHINE    PROGRAM        EVENT                LAST_CALL_ET SQL_ID             WT_SECINW  SQLTEXT
---------- ---------- ---------- -------------- -------------------- ------------ ------------------ ---------- ------------------------------
ACCOUNT          4140 kdbdsc4    intf_accsrv@kd db file sequential r            0 4t9v2u4zdspxd      0:0        select a.scoretypeid,to_char(s
ACCOUNT          3221 kdbdsc3    intf_accsrv@kd db file sequential r            0 4t9v2u4zdspxd      0:0        select a.scoretypeid,to_char(s
USERINFO         3889 kycbe13    JDBC Thin Clie SQL*Net message from            0 33hk2fjysmd3w      -1:0       update ENTITY_SYNC_LOG set sen
REPORT           4158 kybb1      task@kybb1 (TN latch: ges resource          3723 2mk5dt5xjg7zt      0:0        INSERT INTO TB_BALANCE_FLOW_LO
REPORT           3819 kybb1      task@kybb1 (TN latch: ges resource          3723 6n30d53a4ts94      0:0        insert into tb_collect_log_red
REPORT           4110 kybb1      task@kybb1 (TN latch: ges resource          3723 6n30d53a4ts94      0:0        insert into tb_collect_log_red
REPORT           4169 kybb1      task@kybb1 (TN latch: ges resource          3723 2mk5dt5xjg7zt      0:0        INSERT INTO TB_BALANCE_FLOW_LO
REPORT           4042 kybb1      task@kybb1 (TN latch: ges resource          3848 acn50cgh015ba      0:0        insert into  tb_user_charge_d
REPORT           4191 kybb1      task@kybb1 (TN latch: ges resource          3848 d9hjdb4nkg18t      0:0        insert into  tb_user_charge_d
REPORT           3304 kybb1      task@kybb1 (TN latch: ges resource        139542 b5anr2qjt2h9k      2:0        insert /*+append*/into stat_ba
REPORT           4002 kybb1      task@kybb1 (TN latch: ges resource        172083 cfyrg0hs1qh3a      0:0        UPDATE SUBSCRIBER S SET S.VIPI
REPORT           3584 kybb1      oracle@anbob2  latch: ges resource        219101 8av2z2xn17qj4      0:0               create table xxxx_315
REPORT           3718 kybb1      oracle@anbob2  latch: ges resource        219101 8av2z2xn17qj4      0:0               create table xxxx_315
REPORT           3285 kybb1      oracle@anbob2  latch: ges resource        219101 8av2z2xn17qj4      0:0               create table xxxx_315
REPORT           3394 kybb1      oracle@anbob2  latch: ges resource        219101 8av2z2xn17qj4      0:0               create table xxxx_315
REPORT           3830 kybb1      oracle@anbob2  latch: ges resource        219101 8av2z2xn17qj4      0:0               create table xxxx_315
REPORT           3652 kybb1      oracle@anbob2  latch: ges resource        219101 8av2z2xn17qj4      0:0               create table xxxx_315
REPORT           3614 kybb1      oracle@anbob2  latch: ges resource        219101 8av2z2xn17qj4      27:0              create table xxxx_315
REPORT           3609 kybb1      oracle@anbob2  latch: ges resource        219101 8av2z2xn17qj4      0:0               create table xxxx_315
REPORT           3864 kybb1      task@kybb1 (TN PX Deq: Execute Repl       219248 8av2z2xn17qj4      0:0               create table xxxx_315
REPORT           3596 kybb1      oracle@anbob2  PX Deq: Execution Ms       219247 19gg34w9r4r5m      0:1        SELECT * FROM REGION_LIST WHER
REPORT           3850 kybb1      oracle@anbob2  PX Deq: Execution Ms       219247 19gg34w9r4r5m      0:2        SELECT * FROM REGION_LIST WHER
SYS              3560 anbob2     sqlplus@anbob2 SQL*Net message to c            3 d2b6tdq5jmju3      -1:3         select  /*+rule*/    ses.use

Tip:
可以看到当前连接会话中latch ges resource的也就两个program,其中有个create table 的进程使用了8个parallel processes。而且这个会话已经持续执行了快22万秒,下面使用Poder的脚本查看是否有latch 级联阻塞。

SQL> @ash_wait_chains session_id||':'||program2||event2 session_type='FOREGROUND' sysdate-10/24/60 sysdate

-- Display ASH Wait Chain Signatures script v0.2 BETA by Tanel Poder ( http://blog.tanelpoder.com )

%This     SECONDS        AAS WAIT_CHAIN
------ ---------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   1%         599          1 -> 3761:(task) latch: ges resource hash list
   1%         599          1 -> 3797:(task) ON CPU
   1%         597          1 -> 3474:(intunit_get_sq) library cache lock
   1%         597          1 -> 3869:(task) library cache lock
   1%         597          1 -> 3303:(task) library cache lock
   1%         574          1 -> 3261:(sqlplus) SQL*Net message from dblink
   1%         527         .9 -> 4169:(task) latch: ges resource hash list
   1%         524         .9 -> 3464:(sqlplus) SQL*Net message from dblink
   1%         523         .9 -> 3718:(Pnnn) latch: ges resource hash list
   1%         521         .9 -> 3652:(Pnnn) latch: ges resource hash list
   1%         521         .9 -> 4002:(task) latch: ges resource hash list
   1%         521         .9 -> 3584:(Pnnn) latch: ges resource hash list
   1%         521         .9 -> 3304:(task) latch: ges resource hash list
   1%         518         .9 -> 3830:(Pnnn) latch: ges resource hash list
   1%         517         .9 -> 4299:(sqlplus) db file sequential read
   1%         511         .9 -> 3394:(Pnnn) latch: ges resource hash list
   1%         510         .9 -> 3609:(Pnnn) latch: ges resource hash list
   1%         509         .8 -> 4191:(task) latch: ges resource hash list
   1%         509         .8 -> 3780:(sqlplus) db file sequential read
   1%         507         .8 -> 4110:(task) latch: ges resource hash list
   1%         506         .8 -> 3614:(Pnnn) latch: ges resource hash list
   1%         504         .8 -> 4158:(task) latch: ges resource hash list
   1%         503         .8 -> 3301:(sqlplus) db file sequential read
   1%         502         .8 -> 3895:(sqlplus) ON CPU
   1%         499         .8 -> 3285:(Pnnn) latch: ges resource hash list
   1%         497         .8 -> 3819:(task) latch: ges resource hash list
   1%         496         .8 -> 4042:(task) latch: ges resource hash list
   1%         471         .8 -> 4193:(sqlplus) SQL*Net message from dblink
   1%         441         .7 -> 2975:(oracle) db file sequential read
   1%         344         .6 -> 2928:(sqlplus) db file scattered read

30 rows selected.

SQL> @uspid 3867094

USERNAME                SID                 AUDSID OSUSER           MACHINE            PROGRAM              SPID         SQL_HASH_VALUE   LASTCALL STATUS
----------------------- -------------- ----------- ---------------- ------------------ -------------------- ------------ -------------- ---------- --------
                         '4405,1'                0 oracle           anbob2             oracle@anbob2 (PMON) 3867094                   0    1865174 ACTIVE

SQL> oradebug setospid 3867094
Oracle pid: 2, Unix process pid: 3867094, image: oracle@anbob2 (PMON)

SQL> oradebug short_stack
ksdxfstk+002c<-ksdxcb+0500<-sspuser+0074<-000048BC<-sskgpwwait+0034<-skgpwwait+00bc<-ksliwat+06c0<-kslwaitns_timed+0024<
-kskthbwt+0280<-kslewat+01bc<-kslges+04b8<-kslgetl+0384<-kjuscl+0390<-ksiprls+0178<-ksqcmi+2a9c<-ksqrcl+051c<-ksqdeli+014c
<-ksqdel+0014<-ksqsod+00d4<-kssxdl+0350<-kssdch_stage+05d4<-kssdch+0014<-ksudlc+01fc<-kssxdl+0350<-kssdch_stage+05d4<-kssdch+0014
<-ksudlc+01fc<-kssxdl+0350<-kssdch_stage+05d4<-kssdch+0014<-ksudlc+01fc<-kssxdl+0350<-ksudlp+018c<-kssxdl+0350<-ksuxdl+02a0
<-ksuxda+028c<-ksucln+085c<-ksbrdp+04ec<-opirip+041c<-opidrv+0478<-sou2o+0090<-opimai_real+0150<-main+0098<-__start+0070 SQL> @s 4405

    SID SQLID_AND_CHILD      STATUS   STATE   EVENT                                          SEQ# SEC_IN_WAIT BLOCKING_SID P1                
 P2                 P3                 P1TRANSL
------- -------------------- -------- ------- ---------------------------------------- ---------- ----------- ------------ ------------------ 
------------------ ------------------ ------------------------------------------
   4405  0                   ACTIVE   WAITING latch: ges resource hash list                 33650           0 UNKNOWN      address=           
number= 66         tries= 39          0x700000AD1A9EE98: ges resource hash list[
# awr 
              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:     50106 05-Jun-15 15:30:42       837      16.7
  End Snap:     50107 05-Jun-15 16:00:31       838      16.8
   Elapsed:               29.82 (mins)
   DB Time:              691.71 (mins)


Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
latch: ges resource hash list       166,505      25,783    155   62.1      Other
db file sequential read           1,831,050      10,326      6   24.9   User I/O
CPU time                                          4,056           9.8
gc current block 2-way            1,173,399         904      1    2.2    Cluster
gc cr grant 2-way                   847,244         468      1    1.1    Cluster
          -------------------------------------------------------------

NOTE:
从上面的信息可以发现当前的pmon 进程在等待中,从call stack 可以确认,wait event 是’latch: ges resource hash list’, 也是当前实例的top event, 猜测与当前的CTAS 的会话有关,查询了MOS与该事件相关存在一个bug

Bug 11690639 – High enqueue activity results in “latch: ges resource hash list” waits (文档 ID 11690639.8)

处理方法21万秒那个会话存在异常,kill了那个会话,该事件消失。

SQL> @a
A-Script: Display active sessions...

  COUNT(*) SQL_ID        STATE   EVENT
---------- ------------- ------- ----------------------------------------------------------------
         5 4t9v2u4zdspxd WAITING db file sequential read
         2 9z11d7hxfb1u6 WAITING db file sequential read
         2 2mk5dt5xjg7zt WAITING db file sequential read
         1 4t9v2u4zdspxd WAITING gc cr request
         1 7gtuunjsd8cws WAITING db file sequential read
         
SQL> @s 4405

    SID SQLID_AND_CHILD      STATUS   STATE   EVENT                                          SEQ# SEC_IN_WAIT BLOCKING_SID P1                 P2                 P3                 P1TRANSL
------- -------------------- -------- ------- ---------------------------------------- ---------- ----------- ------------ ------------------ ------------------ ------------------ ------------------------------------------
   4405  0                   ACTIVE   WORKING On CPU / runqueue                             38981          19 UNKNOWN     

NOTE:
可以看到pmon 进程当前的event 是’on cpu’ ,恢复了正常的状态, 下面检查监听,动态注册也恢复了正常。

[oracle@anbob2:/home/oracle]#/oracle/app/11.2.0.3/grid/bin/lsnrctl status

LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 05-JUN-2015 17:21:10

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
Start Date                05-JUN-2015 12:07:11
Uptime                    0 days 5 hr. 13 min. 59 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      ON
Listener Parameter File   /oracle/app/11.2.0.3/grid/network/admin/listener.ora
Listener Log File         /oracle/app/grid/diag/tnslsnr/anbob2/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.16)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.116)(PORT=1521)))
Services Summary...
Service "anbobc.com" has 2 instance(s).
  Instance "anbobc2", status UNKNOWN, has 1 handler(s) for this service...
  Instance "anbobc2", status READY, has 1 handler(s) for this service...
Service "anbobc_XPT.com" has 1 instance(s).
  Instance "anbobc2", status READY, has 1 handler(s) for this service...
The command completed successfully

[oracle@anbob2:/home/oracle]#/oracle/app/11.2.0.3/grid/bin/lsnrctl service

LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 05-JUN-2015 17:21:29

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
Services Summary...
Service "anbobc.com" has 2 instance(s).
  Instance "anbobc2", status UNKNOWN, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:1969 refused:0
         LOCAL SERVER
  Instance "anbobc2", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:128 refused:0 state:ready
         LOCAL SERVER
Service "anbobc_XPT.com" has 1 instance(s).
  Instance "anbobc2", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:128 refused:0 state:ready
         LOCAL SERVER
The command completed successfully

总结:
当前该bug还无法确认,不过可以确认的是CTAS 长期持有latch: ges resource hash list ,堵塞了pmon 对latch 的获取,影响了对listener的注册和service_update,listener 中的会话信息只增加不更新已释放,慢慢导致了最后的service 状态blocked, 致使会话通过监听时报ora-12519 and 12516.

打赏

对不起,这篇文章暂时关闭评论。