oracle后台进程当出现问题时,有些进程kill会导致实例立即重启,像smon, pmon,ckpt.. , 而有些进程kill并不会影响实例可用性, 甚至会立即做进程级重启从db alert log可以观测到,如mmon,rec,jnnn, pnnn等, 前两天看到Poder在其BLOG分享v$process的基表X$KSUPR中中有记录哪些是oracle的致命进程,在X$KSUPR.KSUPRFLG第3位, 下面我做个测试, kill 点X$KSUPR.KSUPRFLG第3位都不为1的进程。
— 环境 oracle 19.3 on-primise
致命进程
SELECT indx,ksuprpnm,TO_CHAR(ksuprflg,'XXXXXXXXXXXXXXXX'),KSUPROSID
FROM x$ksupr
WHERE BITAND(ksuprflg,4) = 4 ORDER BY indx
4 /
INDX KSUPRPNM TO_CHAR(KSUPRFLG, KSUPROSID
---------- ------------------------------------------------ ----------------- ------------------------
2 oracle@oel7db1 (PMON) E 2245
3 oracle@oel7db1 (CLMN) E 2247
4 oracle@oel7db1 (PSP0) 6 2249
5 oracle@oel7db1 (VKTM) 6 2251
6 oracle@oel7db1 (GEN0) 6 2255
8 oracle@oel7db1 (MMAN) 6 2259
13 oracle@oel7db1 (DBRM) 6 2270
16 oracle@oel7db1 (PMAN) 6 2276
18 oracle@oel7db1 (DBW0) 6 2280
19 oracle@oel7db1 (LGWR) 6 2282
20 oracle@oel7db1 (CKPT) 6 2284
21 oracle@oel7db1 (SMON) 16 2286
25 oracle@oel7db1 (LREG) 6 2294
非致命进程
SELECT indx,ksuprpnm,TO_CHAR(ksuprflg,'XXXXXXXXXXXXXXXX'),KSUPROSID
FROM x$ksupr
WHERE BITAND(ksuprflg,4) != 4 and KSUPROSID is not null ORDER BY indx
/
INDX KSUPRPNM TO_CHAR(KSUPRFLG, KSUPROSID
---------- ------------------------------------------------ ----------------- ------------------------
0 oracle@oel7db1 (MZ00) 0 3949
7 oracle@oel7db1 (DIAG) 2 2265
9 oracle@oel7db1 (GEN1) 2 2261_2263
10 oracle@oel7db1 (SCMN) 2 2261_2261
11 oracle@oel7db1 (OFSD) 2 2267_2268
12 oracle@oel7db1 (SCMN) 2 2267_2267
14 oracle@oel7db1 (VKRM) 2 2272
15 oracle@oel7db1 (SVCB) 2 2274
17 oracle@oel7db1 (DIA0) 2 2278
22 oracle@oel7db1 (SMCO) 2 2288
23 oracle@oel7db1 (RECO) 2 2290
24 oracle@oel7db1 (W000) 2 2292
26 oracle@oel7db1 (W001) 2 2296
27 oracle@oel7db1 (PXMN) 2 2298
28 oracle@oel7db1 (S000) 40 2308
29 oracle@oel7db1 (MMON) 2 2302
30 oracle@oel7db1 (MMNL) 2 2304
31 oracle@oel7db1 (D000) 80 2306
32 oracle@oel7db1 (TMON) 2 2310
33 oracle@oel7db1 (TNS V1-V3) 0 2722
34 oracle@oel7db1 (M000) 2 2316
35 oracle@oel7db1 (TT00) 2 2324
36 oracle@oel7db1 (TT01) 2 2326
37 oracle@oel7db1 (TT02) 2 2328
38 oracle@oel7db1 (W002) 2 2332
39 oracle@oel7db1 (AQPC) 2 2336
40 oracle@oel7db1 (W003) 2 2339
41 oracle@oel7db1 (W004) 2 2349
42 oracle@oel7db1 (P000) 0 2344
43 oracle@oel7db1 (P001) 0 2346
44 oracle@oel7db1 (M004) 0 3498
45 oracle@oel7db1 (CJQ0) 2 2357
46 oracle@oel7db1 (QM02) 2 2435
47 oracle@oel7db1 (Q001) 2 2437
48 oracle@oel7db1 (W005) 2 3179
49 oracle@oel7db1 (Q003) 2 2442
50 oracle@oel7db1 (W006) 2 3183
51 oracle@oel7db1 (W007) 2 3188
52 oracle@oel7db1 (MZ00) 0 3949
53 oracle@oel7db1 (J001) 0 3802
54 oracle@oel7db1 (MZ00) 0 3793
55 oracle@oel7db1 (J004) 0 2523
56 oracle@oel7db1 (J005) 0 2525
57 oracle@oel7db1 (J006) 0 2527
58 oracle@oel7db1 (J007) 0 2530
59 oracle@oel7db1 (J008) 0 2532
60 oracle@oel7db1 (J009) 0 2534
61 oracle@oel7db1 (J00A) 0 2536
62 oracle@oel7db1 (J00B) 0 2538
63 oracle@oel7db1 (J00C) 0 2540
64 oracle@oel7db1 (J00D) 0 2543
65 oracle@oel7db1 (J00E) 0 2545
66 oracle@oel7db1 (M001) 2 2569
67 oracle@oel7db1 (M002) 2 2571
68 oracle@oel7db1 (M003) 2 2573
69 oracle@oel7db1 (Q005) 0 2610
70 oracle@oel7db1 (Q006) 0 2612
71 oracle@oel7db1 (Q007) 0 2614
72 oracle@oel7db1 (Q008) 0 2616
73 oracle@oel7db1 (Q009) 0 2618
74 oracle@oel7db1 (Q00A) 0 2620
75 oracle@oel7db1 (Q00B) 0 2622
76 oracle@oel7db1 (Q00C) 0 2624
77 oracle@oel7db1 (Q00D) 0 2626
78 oracle@oel7db1 (Q00E) 0 2628
79 oracle@oel7db1 (Q00F) 0 2630
80 oracle@oel7db1 (Q00G) 0 2633
81 oracle@oel7db1 (Q00H) 0 2635
82 oracle@oel7db1 (Q00I) 0 2637
83 oracle@oel7db1 (Q00J) 0 2639
84 oracle@oel7db1 (Q00K) 0 2641
85 oracle@oel7db1 (Q00L) 0 2643
86 oracle@oel7db1 (Q00M) 0 2645
299 oracle@oel7db1 (TNS V1-V3) 0 2233
KILL ALL 非致命进程
SELECT 'host kill -9 '||KSUPROSID FROM x$ksupr WHERE BITAND(ksuprflg,4) != 4 and KSUPROSID is not null ORDER BY indx /
— 果然实例没有重启
DB ALERT LOG
Restarting dead background process DIAG Starting background process DIAG DIAG started with pid=17, OS id=4520 Restarting dead background process OFSD Starting background process OFSD OFSD started with pid=22, OS id=4522_4524 Restarting dead background process VKRM Starting background process VKRM Oracle running with ofslib:'Oracle File Server Library' version=2 VKRM started with pid=24, OS id=4526 Restarting dead background process SVCB Starting background process SVCB SVCB started with pid=26, OS id=4528 Restarting dead background process DIA0 Starting background process DIA0 DIA0 started with pid=27, OS id=4530 Restarting dead background process TMON Starting background process TMON TMON started with pid=28, OS id=4532 Restarting dead background process SMCO Starting background process SMCO SMCO started with pid=29, OS id=4534 Restarting dead background process RECO Starting background process RECO RECO started with pid=30, OS id=4536 Restarting dead background process CJQ0 Starting background process CJQ0 CJQ0 started with pid=32, OS id=4540 Restarting dead background process PXMN Starting background process PXMN PXMN started with pid=35, OS id=4544 Restarting dead background process AQPC Starting background process AQPC AQPC started with pid=36, OS id=4546 Restarting dead background process MMON Starting background process MMON MMON started with pid=37, OS id=4548 Restarting dead background process MMNL Starting background process MMNL MMNL started with pid=38, OS id=4550 2020-05-12 11:25:58.025000 -04:00 TT00 (PID:4560): Gap Manager starting
进程自动重启, cool!
— 2023 update –
另一个方法可以从proc系统中的环境变量查找FATAL 关键字,如19c rac中的致命进程
root@19c1:/root $for p in `ps -ef | grep $ORACLE_SID | awk '{ print $2 }'` ; do grep -q FATAL /proc/$p/environ ; x=$? ; [ $x -eq 0 ] && ps --no-headers -fp $p ; done
oracle 10468 1 0 Oct10 ? 00:03:56 ora_pmon_anbob1
oracle 10470 1 0 Oct10 ? 00:01:16 ora_clmn_anbob1
oracle 10472 1 0 Oct10 ? 00:06:58 ora_psp0_anbob1
oracle 10609 1 0 Oct10 ? 01:36:34 ora_ipc0_anbob1
oracle 10622 1 1 Oct10 ? 11:48:54 ora_vktm_anbob1
oracle 10639 1 0 Oct10 ? 00:06:14 ora_gen0_anbob1
oracle 10647 1 0 Oct10 ? 00:01:14 ora_mman_anbob1
oracle 10663 1 0 Oct10 ? 01:24:40 ora_dbrm_anbob1
oracle 10673 1 0 Oct10 ? 00:01:06 ora_acms_anbob1
oracle 10683 1 0 Oct10 ? 00:05:32 ora_pman_anbob1
oracle 10708 1 0 Oct10 ? 01:01:19 ora_lmon_anbob1
oracle 10711 1 0 Oct10 ? 02:30:13 ora_lmd0_anbob1
oracle 10714 1 0 Oct10 ? 04:59:55 ora_lms0_anbob1
oracle 10716 1 0 Oct10 ? 05:12:09 ora_lms1_anbob1
oracle 10718 1 0 Oct10 ? 02:15:24 ora_lmd1_anbob1
oracle 10726 1 0 Oct10 ? 00:07:08 ora_rms0_anbob1
oracle 10732 1 0 Oct10 ? 00:03:25 ora_lck1_anbob1
oracle 10734 1 0 Oct10 ? 00:08:54 ora_dbw0_anbob1
oracle 10738 1 0 Oct10 ? 00:17:42 ora_lgwr_anbob1
oracle 10754 1 0 Oct10 ? 00:17:28 ora_ckpt_anbob1
oracle 10770 1 0 Oct10 ? 01:25:39 ora_smon_anbob1
oracle 10787 1 0 Oct10 ? 00:02:26 ora_lreg_anbob1
oracle 10809 1 0 Oct10 ? 00:01:37 ora_rbal_anbob1
oracle 10811 1 0 Oct10 ? 00:04:09 ora_asmb_anbob1
oracle 10813 1 0 Oct10 ? 00:34:23 ora_fenc_anbob1
oracle 10853 1 0 Oct10 ? 00:57:09 ora_imr0_anbob1
oracle 10941 1 0 Oct10 ? 00:13:36 ora_lck0_anbob1