首页 » ORACLE » orphaned processes not cleanup when using resource limit(profile) in 9i RAC, cause skgpspawn failed(进程不回收)

orphaned processes not cleanup when using resource limit(profile) in 9i RAC, cause skgpspawn failed(进程不回收)

Question:
Why is my database after running for some time , large number of database server processes  will be created , but the number of sessions is very little, what actions cause orphaned processes? And sometimes hit ora-20 error, when I kill that OS processes manually  every time,the problem is solved . and another node without symptoms.

Answer:

skgpspawn failed:category = 27143, depinfo = 24, op = pipe, loc = skgpspawn2
puts forward some ideas and methods for to solve this problem
1,Check free memory and swap on the server.
2,Check the oracle account’s process limits?
$ ulimit -a
..
bug

Read previous article skgpspawn failed:category = 27143 in alert log

The following is  Step by step diagnostic methods in the case

DB env:
2 nodes 9i(9.2.0.4) rac on hp-ux

in alert.
###########################

skgpspawn failed:category = 27143, depinfo = 24, op = pipe, loc = skgpspawn2
skgpspawn failed:category = 27143, depinfo = 24, op = pipe, loc = skgpspawn2
skgpspawn failed:category = 27143, depinfo = 24, op = pipe, loc = skgpspawn2
skgpspawn failed:category = 27143, depinfo = 24, op = pipe, loc = skgpspawn2
skgpspawn failed:category = 27143, depinfo = 24, op = pipe, loc = skgpspawn2
skgpspawn failed:category = 27143, depinfo = 24, op = pipe, loc = skgpspawn2

# swapinfo -atm

when after cleaned up  manually 500+ OS processes with OS kill -9 command, above output show total line PCT used 73% on the node.

when I try to find out difference processes between v$process and v$session using following SQL:

sql> select spid,username,program from v$process where addr not in(select paddr from v$session);

spid   username  program
------ -------- --------------------------
17062  oracle      oracle@par3a (TNS V1-V3)
16711  oracle      oracle@par3a (TNS V1-V3)
17064  oracle      oracle@par3a (TNS V1-V3)
...
100+ process.

NOTE:
I not means number of  processes always equal number of  sessions, and Tom said “a process can have many sessions. a session may or may not have a process.”  About Difference between Session and Process can read Arup Nanda’ note

Next I try to check what is program “oracle@par3a (TNS V1-V3)” from v$session view.

 

 

sql> select USERNAME    ,STATUS    ,SERVER,OSUSER,PROCESS,MACHINE,TERMINAL,PROGRAM
from v$session where program like 'oracle@par3a%'

USERNAME        STATUS        SERVER        OSUSER    PROCESS    MACHINE    TERMINAL    PROGRAM
----------------------------------------------------------------------------------------------
dbuser01            INACTIVE    DEDICATED     Administrator    1909036    par3a        oracle@par3a (TNS V1-V3)
dbuser01            INACTIVE    DEDICATED     Administrator    1650968    par3a        oracle@par3a (TNS V1-V3)
dbuser01            INACTIVE    DEDICATED     Administrator    2146628    par3a        oracle@par3a (TNS V1-V3)
dbuser01            INACTIVE    DEDICATED     Administrator    2601276    par3a        oracle@par3a (TNS V1-V3)
dbuser01            INACTIVE    DEDICATED     Administrator    3330508    par3a        oracle@par3a (TNS V1-V3)
dbuser01            INACTIVE    DEDICATED     Administrator    262624    par3a        oracle@par3a (TNS V1-V3)

Tip:
Upon inquiry is par3a machines(10g rac) extract data  use database link with a application from the database. I used these sql script to make sure  who is using db_link to conect me .

then I check user’s  resource limit with user profile .

 

SQL>show parameter resource_limit
---------
true

SQL>select profile from dba_users where username='dbuser01';
----------
IDLE_TIME_PROFILE

SQL>select resource_name ,limit from dba_profiles where profile='IDLE_TIME_PROFILE'

resource_name       limit
--------------      --------------
IDLE_TIME            90
...

TIP:
The user had 1.5 hours IDLE_TIME resource limit, RESOURCE_LIMIT Valid values for the parameter are TRUE (enables enforcement) and FALSE. By default, this parameter’s value is set to FALSE IN 9I.

IDLE_TIME Specify the permitted periods of continuous inactive time during a session, expressed in minutes. Long-running queries and other operations are not subject to this limit.
“When a resource limit is exceeded (for example IDLE_TIME) … PMON does the following

* Mark the V$SESSION as SNIPED

* Clean up the database resources for the session

* Remove the V$SESSION entry(no v$process)     When a resource limit is exceeded (for example IDLE_TIME) …

PMON marks the session as SNIPED in V$SESSION.  Then, AFTER the SNIPED session tries to execute any SQL statement, its database resources are cleaned up and its V$SESSION entry is removed.”
sql> SELECT USERNAME,MACHINE,PROGRAM FROM V$SESSION WHERE STATUS=’SNIPED’;

It’s output show many connects of username is ‘dbuser01’ and program is ‘oracle@par3a (TNS V1-V3)’;

Tip:
SNIPED – An inactive session that has exceeded some configured limits (for example, resource limits specified for the resource manager consumer group or idle_time specified in the user’s profile). Such sessions will not be allowed to become active again.

It is strongly recommended that both DCD and Resource Limits with Profiles be implemented in order to clean up resources at both the database and OS level

This combination will not clean up IDLE / ABANDONED / INACTIVE connections (OS processes) as these sessions still have active clients

For this case we will see that :
* PMON has cleaned up the V$SESSION entries .. but both the OS processes and the V$PROCESS entries will still exist
* SQLNET will continue to be able to send the 10 byte packet successfully until the session is logged off

This condition can be a major problem as
* The database exhausts PROCESSES and gives ORA-20 maximum number of processes <num> exceeded
* The OS can become exhausted due to the unneeded resources consumed by the abandoned processes

The SYMPTOMS of this condition are
* The database view V$PROCESS will have no corresponding V$SESSION entry
* An OS process / thread still exists for the SNIPED session

The solutions to this scenario can are to cleanup the OS processes … after which the V$PROCESS entries should be removed automatically

Cause

So this case the root problem has been found,In short is another db use dblink  query this db, but the user had Idel_time resource limit, when the link user  session on the client idle, and Over that time(90 minutes),  pmon cleaned up the v$session entries, but both the OS process and V$process entries will still exist, and these process allocate memory and hold resource never released. Over time , the orphaned process becomes more and more, Make memory resource depletion, so cause swap  and ora-20 issue.   and Solution use DCD and Resource Limit profile to cleanup idel process.
Another , you should also check the HP-UX kernel parameter are correct like filecache_max,filecache_min,dbc_max_pct,dbc_min_pct..

Summary of what was discussed here:

1) DCD initiates clean up of OS and database processes  that have disconnected / terminated abnormally
2) DCD will not initiate clean up sessions that are still connected … but are idle / abandoned / inactive
3) Database Resource Limits + user Profiles clean up database resources for user sessions that exceed resource limits
4) Database Resource Limits + user Profiles will not clean up OS processes
5) If DCD and Database Resource Limits + user Profiles are used in combination .. Dead Connections OS and Database Resources will be cleaned up
6) IDLE / ABANDONED / INACTIVE sessions OS processes will not be cleaned up even if DCD and Database Resource Limits + user Profiles are used in combination … these must be cleaned up manually

Then I try to dump a OS process id 17062 not in v$session processstat .

oradebug setospid 17062
oradebug dump processstate 10
oradebug tracefile_name

/u01/app/oracle/admin/dbzy/udump/dbzy1_ora_17062.trc
Oracle9i Enterprise Edition Release 9.2.0.4.0 – 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 – Production
ORACLE_HOME = /u01/app/oracle/product/9.2.0
System name:    HP-UX
Node name:    hbdbzy1
Release:    B.11.23
Version:    U
Machine:    ia64
Instance name: dbzy1
Redo thread mounted by this instance: 1
Oracle process number: 34
Unix process pid: 17062, image: oracle@hbdbzy1 (TNS V1-V3)

*** 2014-08-26 14:06:50.741
*** SESSION ID:(17.45) 2014-08-26 14:06:50.709
===================================================
PROCESS STATE
————-
Process global information:
process: c00000044dad8440, call: 0000000000000000, xact: 0000000000000000, curses: 0000000000000000, usrses: 0000000000000000
—————————————-
SO: c00000044dad8440, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
(process) Oracle pid=34, calls cur/top: 0000000000000000/c00000045d55c6d0, flag: (0) –
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: c00000044dc08080
O/S info: user: oracle, term: UNKNOWN, ospid: 17062
OSD pid info: Unix process pid: 17062, image: oracle@hbdbzy1 (TNS V1-V3)
—————————————-
SO: c00000044eebff48, type: 13, owner: c00000044dad8440, flag: INIT/-/-/0x00
(broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: c00000044dad8440,
event: 216, last message event: 216,
last message waited event: 216, messages read: 0
channel: (c00000044eefbb40) system events broadcast channel
scope: 0, event: 43381, last mesage event: 0,
publishers/subscribers: 0/154,
messages published: 0
—————————————-
SO: c000000450613800, type: 19, owner: c00000044dad8440, flag: INIT/-/-/0x00
GES MSG BUFFERS: st=emp chunk=0x0000000000000000 hdr=0x0000000000000000 lnk=0x0000000000000000 flags=0x0
outq=0 sndq=0 opid=0 prmb=0x0
mbg[i]=(0 0) mbg[b]=(0 0) mbg[r]=(0 0)
fmq[i]=(0 0) fmq[b]=(0 0) fmq[r]=(0 0)
mop[s]=0 mop[q]=0 pendq=0 workq=0
————process———————
proc version      : 6
Local node        : 0
gid               : 0
pid               : 17062
lkp_node          : 0
proc state        : KJP_FROZEN    –Frozen process
Total accesses    : 55
Imm.  accesses    : 54
Locks on ASTQ     : 0
Locks Pending AST : 0
Granted locks     : 0
AST_Q     :NULL
PENDING_Q :NULL
GRANTED_Q :NULL
AST_Q:
PENDING_Q:
GRANTED_Q:
—————————————-
SO: c00000045d55c6d0, type: 3, owner: c00000044dad8440, flag: INIT/-/-/0x00
(call) sess: cur 0, rec 0, usr 0; depth: 0
END OF PROCESS STATE

References:
A discussion of Dead Connection Detection, Resource Limits, V$SESSION, V$PROCESS and OS processes ( ID 601605.1)

打赏

目前这篇文章有1条评论(Rss)评论关闭。

  1. admin | #1
    2014-08-29 at 15:05

    Methods to cleanup OS processes:

    * UNIX : kill -x … the OS process at the OS level (typically kill -9)
    * UNIX: if using a dedicated server, use the following shell script to kill the shadow process (script has been tested on Solaris, AIX, Tru64 and HPUX):

    #!/bin/sh
    tmpfile=/tmp/tmp.$$
    sqlplus system/manager <