首页 » MySQL, ORACLE, PostgreSQL » How to Kill session or Cancel SQL query on Oracle , MySQL, PostgreSQL(数据库比较系列八)

How to Kill session or Cancel SQL query on Oracle , MySQL, PostgreSQL(数据库比较系列八)

数据库维护过程中难免会遇到一些不正常的SQL或会话进程正在占用系统大量资源,临时需要终止查询或kill会话,在Oracle, MySQL, Postgresql数据库中不同的操作。

Oracle

KILL会话的基本语法。

SQL> ALTER SYSTEM KILL SESSION 'sid,serial#';

在 RAC 环境中,您可以选择指定INST_ID。这允许您终止不同 RAC 节点上的会话。

SQL> ALTER SYSTEM KILL SESSION 'sid,serial#,@inst_id';

除了上述语法之外,您还可以添加IMMEDIATE子句。

SQL> ALTER SYSTEM KILL SESSION 'sid,serial#' IMMEDIATE;

这不会影响命令执行的工作,但它会立即将控制权返回给当前会话,而不是等待确认终止。

该ALTER SYSTEM DISCONNECT SESSION语法是杀死 Oracle 会话的另一种方法。与KILL SESSION要求会话杀死自己的命令不同,该DISCONNECT SESSION命令杀死专用服务器进程

相关会话的SID和SERIAL#值可以替换为以下语句之一。

SQL> ALTER SYSTEM DISCONNECT SESSION 'sid,serial#' POST_TRANSACTION;
SQL> ALTER SYSTEM DISCONNECT SESSION 'sid,serial#' IMMEDIATE;

该POST_TRANSACTION子句在断开会话之前等待正在进行的事务完成,而该IMMEDIATE子句断开会话并且正在进行的事务立即回滚。

操作系统层

SQL> select spid from v$process where addr in( select paddr from v$session where sid=1);
SPID
------------------------
7110

$ kill -9 7110

How to release still “killed“ status session in v$session? (释放killed的session) (二)记录”killed” 状态有时需要creator_addr字段 select spid from v$process where addr in( select CREATOR_ADDR from v$session where sid=1); ps -ef|grep xx kill -9 要在 Windows 操作系统上终止会话,首先识别会话,然后将相关的值替换为SID从命令行SPID发出的以下命令。

C:\> orakill ORACLE_SID spid

终止查询

ALTER SYSTEM CANCEL SQL命令是在 Oracle Database 18c 中引入的,用于取消会话中的 SQL 语句,而不用终止会话连接。该ALTER SYSTEM CANCEL SQL语句的基本语法如下所示:

ALTER SYSTEM CANCEL SQL 'SID, SERIAL[, @INST_ID][, SQL_ID]';
# session a
SQL> conn / as sysdba
Connected.
USERNAME             INST_NAME            HOST_NAME                  I# SID   SERIAL#  VERSION    STARTED  SPID       OPID  CPID            SADDR            PADDR
-------------------- -------------------- ------------------------- --- ----- -------- ---------- -------- ---------- ----- --------------- ---------------- ----------------
SYS                  CDB$ROOT-anbob19c    oel7db1                     1 1     51226    19.0.0.0.0 20220707 20124      45    7109            0000000078481028 0000000079107FC8

SQL> select count(*) from dba_objects,dba_objects,dba_objects;
select count(*) from dba_objects,dba_objects,dba_objects                  *
ERROR at line 1:
ORA-01013: user requested cancel of current operation

SQL> select sid from v$mystat where rownum=1;

       SID
----------
         1

# session b
SQL> @usid 1
USERNAME                SID                 AUDSID OSUSER           MACHINE            PROGRAM              SPID             OPID CPID                     SQL_ID           HASH_VALUE   LASTCALL STATUS   SADDR            PADDR            TADDR            LOGON_TIME
----------------------- -------------- ----------- ---------------- ------------------ -------------------- -------------- ------ ------------------------ --------------- ----------- ---------- -------- ---------------- ---------------- ---------------- -------------------
SYS                      '1,51226'      4294967295 oracle           oel7db1            (TNS V1-V3)          20124              45 7109                     f5kskn9df2h2p    1524711509         48 ACTIVE   0000000078481028 0000000079107FC8                  2022-07-07 14:50:53

SQL> alter system cancel sql '1,51226,f5kskn9df2h2p';
System altered.

MySQL

查询mySQL进程

SHOW PROCESSLIST;

To see what thread it is then kill it:

KILL [CONNECTION | QUERY] thread_id;

也可以查询information_schema.processlist,与show processlist等同

mysql> select * from information_schema.processlist;
+----+-----------------+-----------+--------------------+---------+------+------------------------+----------------------------------------------+
| ID | USER            | HOST      | DB                 | COMMAND | TIME | STATE                  | INFO                                         |
+----+-----------------+-----------+--------------------+---------+------+------------------------+----------------------------------------------+
|  8 | root            | localhost | NULL               | Query   |  381 | User sleep             | select sleep(1000)                           |
|  9 | root            | localhost | performance_schema | Query   |    0 | executing              | select * from information_schema.processlist |
|  5 | event_scheduler | localhost | NULL               | Daemon  |  401 | Waiting on empty queue | NULL                                         |
+----+-----------------+-----------+--------------------+---------+------+------------------------+----------------------------------------------+
3 rows in set (0.00 sec)

mysql> show processlist;
+----+-----------------+-----------+--------------------+---------+------+------------------------+--------------------+
| Id | User            | Host      | db                 | Command | Time | State                  | Info               |
+----+-----------------+-----------+--------------------+---------+------+------------------------+--------------------+
|  5 | event_scheduler | localhost | NULL               | Daemon  |  434 | Waiting on empty queue | NULL               |
|  8 | root            | localhost | NULL               | Query   |  414 | User sleep             | select sleep(1000) |
|  9 | root            | localhost | performance_schema | Query   |    0 | starting               | show processlist   |
+----+-----------------+-----------+--------------------+---------+------+------------------------+--------------------+
3 rows in set (0.00 sec)

mysql> SELECT CONNECTION_ID();
+-----------------+
| CONNECTION_ID() |
+-----------------+
|               9 |
+-----------------+
1 row in set (0.00 sec)


终止连接

mysql> kill 8;
Query OK, 0 rows affected (0.00 sec)

mysql> select sleep(1000);
ERROR 2013 (HY000): Lost connection to MySQL server during query

终止查询

mysql> kill query 8; 
Query OK, 0 rows affected (0.00 sec)

mysql> select sleep(1000);
+-------------+
| sleep(1000) |
+-------------+
|           1 |
+-------------+
1 row in set (9.27 sec)

MYSQL PROCESSLISTID, MYSQL_OS_THREAD, MYSQL THREAD关系

[root@oel7db1 ~]# ps -ef|grep mysql
root      1940     1  0 20:38 ?        00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --datadir=/usr/local/mysql/data --pid-file=/usr/local/mysql/data/oel7db1.pid
mysql     2027  1940  1 20:38 ?        00:00:08 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --log-error=oel7db1.err --pid-file=/usr/local/mysql/data/oel7db1.pid
root      2089  1897  0 20:38 pts/1    00:00:00 mysql -uroot -px xxxxxxxxxxx -hlocalhost
root      2138  2114  0 20:39 pts/2    00:00:00 mysql -uroot -px xxxxxxxxxxx -hlocalhost

[root@oel7db1 ~]# ps -elT|head -n 1|cat -n ;ps -elT|grep 2027|cat -n
1 F S UID PID SPID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
1 4 S 997 2027 2027 1940 0 80 0 - 323127 poll_s ? 00:00:01 mysqld
2 1 S 997 2027 2036 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld
3 1 S 997 2027 2037 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld
4 1 S 997 2027 2038 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld
5 1 S 997 2027 2039 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld
6 1 S 997 2027 2040 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld
7 1 S 997 2027 2041 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld
8 1 S 997 2027 2042 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld
9 1 S 997 2027 2043 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld
10 1 S 997 2027 2044 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld
11 1 S 997 2027 2045 1940 0 80 0 - 323127 read_e ? 00:00:00 mysqld
12 1 S 997 2027 2046 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
13 1 S 997 2027 2049 1940 0 80 0 - 323127 futex_ ? 00:00:06 mysqld
14 1 S 997 2027 2050 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
15 1 S 997 2027 2051 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
16 1 S 997 2027 2052 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
17 1 S 997 2027 2053 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
18 1 S 997 2027 2054 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
19 1 S 997 2027 2056 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
20 1 S 997 2027 2057 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
21 1 S 997 2027 2058 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
22 1 S 997 2027 2061 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
23 1 S 997 2027 2062 1940 0 80 0 - 323127 hrtime ? 00:00:00 mysqld
24 1 S 997 2027 2063 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
25 1 S 997 2027 2064 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
26 1 S 997 2027 2066 1940 0 80 0 - 323127 futex_ ? 00:00:00 xpl_worker1
27 1 S 997 2027 2067 1940 0 80 0 - 323127 futex_ ? 00:00:00 xpl_worker0
28 1 S 997 2027 2068 1940 0 80 0 - 323127 ep_pol ? 00:00:00 mysqld
29 1 S 997 2027 2072 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
30 1 S 997 2027 2073 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
31 1 S 997 2027 2074 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
32 1 S 997 2027 2075 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
33 1 S 997 2027 2076 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
34 1 S 997 2027 2077 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
35 1 S 997 2027 2079 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
36 1 S 997 2027 2080 1940 0 80 0 - 323127 do_sig ? 00:00:00 mysqld
37 1 S 997 2027 2081 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
38 1 S 997 2027 2083 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
39 1 S 997 2027 2090 1940 0 80 0 - 323127 futex_ ? 00:00:00 mysqld
40 1 S 997 2027 2139 1940 0 80 0 - 323127 poll_s ? 00:00:00 mysqld
mysql会为每个connection创建一个对应mysql thread,连接关闭后,mysql thread生命周期也终止。这个mysql thread可以在processlist、threads表中查看
每个mysql threard将与一个os thread关联在一起,mysql thread销毁后,os thread不会被销毁,可以继续给其他mysql thread使用
如果所有os thread都被mysql thread用光了,下一个connection请求时将会创建新的os thread
mysql> select thread_id,thread_os_id,name,type,PROCESSLIST_ID from performance_schema.threads where PROCESSLIST_ID is not null;
+-----------+--------------+--------------------------------+------------+----------------+
| thread_id | thread_os_id | name                           | type       | PROCESSLIST_ID |
+-----------+--------------+--------------------------------+------------+----------------+
|        43 |         2079 | thread/sql/event_scheduler     | FOREGROUND |              5 |
|        45 |         2083 | thread/sql/compress_gtid_table | FOREGROUND |              7 |
|        48 |         2090 | thread/sql/one_connection      | FOREGROUND |              8 |
|        49 |         2139 | thread/sql/one_connection      | FOREGROUND |              9 |
|        50 |         3309 | thread/sql/one_connection      | FOREGROUND |             10 |
+-----------+--------------+--------------------------------+------------+----------------+
5 rows in set (0.00 sec)

[root@oel7db1 ~]# gdb attach 2027
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.

(gdb) info threads
  Id   Target Id         Frame
  41   Thread 0x7fe1fe5d9700 (LWP 2036) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1
  40   Thread 0x7fe1fd3cc700 (LWP 2037) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1
  39   Thread 0x7fe1fcbcb700 (LWP 2038) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1
  38   Thread 0x7fe1f7fff700 (LWP 2039) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1
  37   Thread 0x7fe1f77fe700 (LWP 2040) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1
  36   Thread 0x7fe1f6ffd700 (LWP 2041) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1
  35   Thread 0x7fe1f67fc700 (LWP 2042) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1
  34   Thread 0x7fe1f5ffb700 (LWP 2043) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1
  33   Thread 0x7fe1f57fa700 (LWP 2044) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1
  32   Thread 0x7fe1f4ff9700 (LWP 2045) "mysqld" 0x00007fe20c60a644 in __io_getevents_0_4 () from /lib64/libaio.so.1
  31   Thread 0x7fe1e7cef700 (LWP 2046) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  30   Thread 0x7fe1e54ec700 (LWP 2049) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
---Type  to continue, or q  to quit---
  29   Thread 0x7fe1e4ceb700 (LWP 2050) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  28   Thread 0x7fe1e44ea700 (LWP 2051) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  27   Thread 0x7fe1e3ce9700 (LWP 2052) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  26   Thread 0x7fe1e34e8700 (LWP 2053) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  25   Thread 0x7fe1e2ce7700 (LWP 2054) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  24   Thread 0x7fe1e24e6700 (LWP 2056) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  23   Thread 0x7fe1e1ce5700 (LWP 2057) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  22   Thread 0x7fe1e14e4700 (LWP 2058) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  21   Thread 0x7fe1e0ce3700 (LWP 2061) "mysqld" 0x00007fe20d5a8a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  20   Thread 0x7fe1e04e2700 (LWP 2062) "mysqld" 0x00007fe20d5abe9d in nanosleep () from /lib64/libpthread.so.0
  19   Thread 0x7fe1dfce1700 (LWP 2063) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  18   Thread 0x7fe1df4e0700 (LWP 2064) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  17   Thread 0x7fe1fdb74700 (LWP 2066) "xpl_worker1" 0x00007fe20d5a8de2 in pthr---Type  to continue, or q  to quit---
ead_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  16   Thread 0x7fe1fdb2d700 (LWP 2067) "xpl_worker0" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  15   Thread 0x7fe1fdae6700 (LWP 2068) "mysqld" 0x00007fe20b909be9 in syscall
    () from /lib64/libc.so.6
  14   Thread 0x7fe1ff8ca700 (LWP 2072) "mysqld" 0x00007fe20d5a8a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  13   Thread 0x7fe1decdf700 (LWP 2073) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  12   Thread 0x7fe1de4de700 (LWP 2074) "mysqld" 0x00007fe20d5a8a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  11   Thread 0x7fe1ddcdd700 (LWP 2075) "mysqld" 0x00007fe20d5a8a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  10   Thread 0x7fe1dd4dc700 (LWP 2076) "mysqld" 0x00007fe20d5a8a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  9    Thread 0x7fe1dccdb700 (LWP 2077) "mysqld" 0x00007fe20d5a8a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  8    Thread 0x7fe1fda9f700 (LWP 2079) "mysqld" 0x00007fe20d5a8a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  7    Thread 0x7fe1fda57700 (LWP 2080) "mysqld" 0x00007fe20b84857a in sigwaitinfo () from /lib64/libc.so.6
  6    Thread 0x7fe1fda0f700 (LWP 2081) "mysqld" 0x00007fe20d5a8a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5    Thread 0x7fe1c7fff700 (LWP 2083) "mysqld" 0x00007fe20d5a8a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
---Type  to continue, or q  to quit---
  4    Thread 0x7fe1fc3a9700 (LWP 2090) "mysqld" 0x00007fe20d5a8de2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3    Thread 0x7fe1fc361700 (LWP 2139) "mysqld" 0x00007fe20b904cef in ppoll ()
   from /lib64/libc.so.6
  2    Thread 0x7fe1fc219700 (LWP 3309) "mysqld" 0x00007fe20b904cef in ppoll ()
   from /lib64/libc.so.6
* 1    Thread 0x7fe20d96a840 (LWP 2027) "mysqld" 0x00007fe20b904c2d in poll ()
   from /lib64/libc.so.6
(gdb)

因为MYSQL是线程模式,用户进程是mysql进程中的一个线程,线程是进程不可分割的一部分,不能在进程之外被杀死。有pthread_kill函数但它仅适用于线程本身的上下文。请注意, pthread_kill() 仅导致在给定线程的上下文中处理信号;信号动作(终止或停止)会影响整个进程。如果kill 线程会导致mysql server crash重启。
(gdb) call pthread_kill(6307)

PostgreSQL

PostgreSQL pg_terminate_backend 和 pg_cancel_backend 它们用于终止正在运行的查询或会话。PostgreSQL 维护任务期间需要这个脚本,我们需要关闭所有连接和会话。

通过指定数据库名称终止所有正在运行的连接的脚本:

SELECT pg_terminate_backend(pg_stat_activity.pid)
FROM pg_stat_activity
WHERE datname = 'datbase_name'
AND pid <> pg_backend_pid();

终止当前数据库的所有正在运行的连接的脚本:

 SELECT pg_terminate_backend(pg_stat_activity.pid) 
 FROM pg_stat_activity 
 WHERE datname = current_database() 
 AND pid <> pg_backend_pid(); 

pg_cancel_backend():
如果你想杀死那些长时间运行的查询,可使用 pg_cancel_backend() 来杀死它。取消一个查询而不破坏连接,停止其他查询。

pg_terminate_backend():
它终止整个进程和数据库连接。

最佳实践是:
首先找到长时间运行的查询及其进程 id (pid),使用 pg_cancel_backend 取消这些查询,如果它没有释放,你应该使用 pg_terminate_backend。

也可以使用自带的pg_ctl工具:

# pg_ctl kill SIGNALNAME PID

Allowed signal names for kill:
ABRT HUP INT KILL QUIT TERM USR1 USR2

# pg_ctl kill TERM 1234

不要是使用kill -9

The PostgreSQL architecture works like this: when you start PostgreSQL you are starting a process called postmaster. Whenever a new connection comes in, this postmaster forks and creates a so-called backend process (BE). This process is in charge of handling exactly one connection. In a working system, you might see hundreds of processes serving hundreds of users. The important thing here is that all of those processes are synchronized through some common chunk of memory (traditionally, shared memory, and in the more recent versions, mapped memory), and all of them have access to this chunk. What might happen if a database connection or any other process in the PostgreSQL infrastructure is killed with kill -9? A process modifying this common chunk of memory might die while making a change. The process killed cannot defend itself against the onslaught, so who can guarantee that the shared memory is not corrupted due to the interruption?
This is exactly when the postmaster steps in. It ensures that one of these backend processes has died unexpectedly. To prevent the potential corruption from spreading, it kills every other database connection, goes into recovery mode, and fixes the database instance. Then new database connections are allowed again.
While this makes a lot of sense, it can be quite disturbing to those users who are connected to the database system. Therefore, it is highly recommended not to use kill -9. A normal kill will be fine.

# check OS pid
anbob=#  select datname,pid,usename,state from pg_stat_activity;

[root@oel7db1 shm]# ps -ef|grep local
postgres  2627  1870  0 15:13 ?        00:00:00 postgres: postgres anbob [local] idle
postgres  2634  1870  0 15:13 ?        00:00:00 postgres: postgres anbob [local] idle
root      3148  2035  0 15:23 pts/2    00:00:00 grep --color=auto local
[root@oel7db1 shm]#
[root@oel7db1 shm]# kill -9 2627
[root@oel7db1 shm]#

# on session
anbob=# select * from tt;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
anbob=#

## another session 

anbob=# select * from tt;
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
anbob=#


[root@oel7db1 shm]# ps -ef|grep local
postgres  3172  1870  0 15:23 ?        00:00:00 postgres: postgres anbob [local] idle
postgres  3176  1870  0 15:24 ?        00:00:00 postgres: postgres anbob [local] idle
root      3446  2035  0 15:28 pts/2    00:00:00 grep --color=auto local

[root@oel7db1 shm]# kill 3172
[root@oel7db1 shm]#


## one session
anbob=# select * from tt;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
anbob=#

## another session
anbob=# select * from tt;
 id | name | language
----+------+----------
    | a    |
    | b    |
(2 rows)

在数据库PostgreSQL中,一个客户端到服务器连接实际上是一个tcp socket连接,tcp连接是虚连接,一方非正常退出(如断电),另一方会继续维持这个连接。

最近遇到一个案例1个posgresql 正在跑的一个存储过程进程无法kill, pg_terminate_backend()无法终止,使用pstack 查看进程在调用plugin_debugger,strace -tt -p pid 发现一直在loop 执行recvform(39, xxxx),等待接受文件句柄fd 39, ls -l /proc/pid/fd/39 发现fd 39指向一个 socket连接
查看进程对应用tcp socket
进程连接处理CLOSE_WAIT,我们知道TCP建立一个连接需要三次握手,而终止一个连接要经过四次挥手, 通常来讲,CLOSE_WAIT状态的持续时间应该很短,为被动关闭连接,主要原因是某种情况下对方关闭了socket链接,但是我方忙与读或者写,没有关闭连接。

注意因为postgresql没有Pmon进程(or SMON)没有办法在kill进程后释放资源,所以即使PG是进程模式,也不建议通过OS kill进程, 防止PG SERVER crash. 解决这个问题有一个小技巧,可以再不kill进程的前提下,释放SOCKET。

1,找到进程,或netstat根据port找进程

2, lsof 根据进程找socket连接,找到FD文件描述符

3,使用gdb p PID  连接到进程

4,关闭 Socket 连接

(gdb) call close(#fd)

这样就结束了案例中recvform的循环,会话报错后退出。

 

打赏

,

目前这篇文章还没有评论(Rss)

我要评论