在从Oracle迁移到PostgreSQL后,SELECT和TRUNCATE操作互相阻塞是一个常见问题,这主要是由于两种数据库的锁机制不同导致的。目前一客户项目在我们迁移到postgresql系的国产数据库生产环境后,时常在一个存储过程中包含truncate又有select 长事务时发生互相堵塞,简单记录.
原因分析
-
锁机制差异:
-
Oracle的TRUNCATE是DDL操作,Select Null 锁,通常不会与SELECT冲突
-
PostgreSQL中TRUNCATE需要获取ACCESS EXCLUSIVE锁,这是最强的锁级别,会阻塞所有其他操作
-
-
MVCC实现差异:
-
PostgreSQL的MVCC实现方式与Oracle不同
-
TRUNCATE在PostgreSQL中实际上是创建新文件并删除旧文件,需要完全独占访问
-
演示
— 这里使用kingbase
-- session 1 test=# create table test(id int); CREATE TABLE test=# insert into test values (1); INSERT 0 1 test=# commit; WARNING: there is no transaction in progress COMMIT test=# begin; BEGIN test=# select * from test; id ---- 1 (1 row) -- session 2 kingbase=# \c test You are now connected to database "test" as userName "system". test=# begin; BEGIN test=# truncate table test; waiting... -- session 3 kingbase=# \c test; You are now connected to database "test" as userName "system". test=# select * from test; waiting...
Note:
看到堵塞session 1 select — session 2 truncate — session select 堵塞链
kingbase=# select datname,pid,xact_start,query_start,wait_event_type,wait_event,state,query from pg_stat_activity;
datname | pid | xact_start | query_start | wait_event_type | wait_event | state |
query
----------+---------+-------------------------------+-------------------------------+-----------------+---------------------+---------------------+--------------------------------------
-------------------------------------------------------------------
....
kingbase | 2424743 | 2025-04-27 17:08:39.106147+08 | 2025-04-27 17:08:39.106147+08 | | | active | select datname,pid,xact_start,query_s
tart,wait_event_type,wait_event,state,query from pg_stat_activity;
test | 2380294 | 2025-04-27 17:06:00.351627+08 | 2025-04-27 17:06:07.249354+08 | Client | ClientRead | idle in transaction | select * from test;
test | 2381560 | 2025-04-27 17:06:13.855348+08 | 2025-04-27 17:06:15.955823+08 | Lock | relation | active | truncate table test;
test | 2423637 | 2025-04-27 17:06:26.795951+08 | 2025-04-27 17:06:26.795951+08 | Lock | relation | active | select * from test;
...
(11 rows)
kingbase=# select * from pg_locks;
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+---------------------+---------+----------
virtualxid | | | | | 8/393 | | | | | 8/393 | 2423637 | ExclusiveLock | t | t
virtualxid | | | | | 7/6 | | | | | 7/6 | 2381560 | ExclusiveLock | t | t
virtualxid | | | | | 6/126 | | | | | 6/126 | 2380294 | ExclusiveLock | t | t
relation | 13591 | 12085 | | | | | | | | 5/834432 | 2424743 | AccessShareLock | t | t
virtualxid | | | | | 5/834432 | | | | | 5/834432 | 2424743 | ExclusiveLock | t | t
relation | 13590 | 16799 | | | | | | | | 8/393 | 2423637 | AccessShareLock | f | f
transactionid | | | | | | 459127 | | | | 7/6 | 2381560 | ExclusiveLock | t | f
relation | 13590 | 16799 | | | | | | | | 6/126 | 2380294 | AccessShareLock | t | f
relation | 13590 | 16799 | | | | | | | | 7/6 | 2381560 | AccessExclusiveLock | f | f
(9 rows)
Note:
注意TRUNCATE 需要AccessExclusiveLock (table) ,而select 需要 AccessShareLock(Table), Truncate的互斥锁排他与很多事务都有锁冲突。可以见文档PostgreSQL Lock Conflicts
我之前的文章有记录过一些<脚本:用于分析PostgreSQL lock trees(堵塞树) 的查询>
SELECT
activity.pid,
activity.usename,
activity.query,
blocking.pid AS blocking_id,
blocking.query AS blocking_query
FROM pg_stat_activity AS activity
JOIN pg_stat_activity AS blocking ON blocking.pid = ANY(pg_blocking_pids(activity.pid));
pid | usename | query | blocking_id | blocking_query
---------+---------+----------------------+-------------+----------------------
2381560 | system | truncate table test; | 2380294 | select * from test;
2423637 | system | select * from test; | 2381560 | truncate table test;
kingbase=# select pg_blocking_pids(2380294);
pg_blocking_pids
------------------
{}
(1 row)
kingbase=# select pg_blocking_pids(2381560);
pg_blocking_pids
------------------
{2380294}
(1 row)
kingbase=# select pg_blocking_pids(2423637);
pg_blocking_pids
------------------
{2381560}
(1 row)
SELECT blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS current_statement_in_blocking_process
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
blocked_pid | blocked_user | blocking_pid | blocking_user | blocked_statement | current_statement_in_blocking_process
-------------+--------------+--------------+---------------+----------------------+---------------------------------------
2381560 | system | 2423637 | system | truncate table test; | select * from test;
2381560 | system | 2380294 | system | truncate table test; | select * from test;
2423637 | system | 2380294 | system | select * from test; | select * from test;
2423637 | system | 2381560 | system | select * from test; | truncate table test;
kingbase=# select * from v_locks_monitor; locktype | datname | relation | page | tuple | transactionid | classid | objid | objsubid | lock_conflict ----------+---------+----------+------+-------+---------------+---------+-------+----------+--------------------------------------------------------------------------------------------- ------------------------------------------------------- relation | test | 16799 | | | | | | | Pid: 2381560 + | | | | | | | | | Lock_Granted: false , Mode: AccessExclusiveLock , Username: system , Database: test , Client _Addr: NULL , Client_Port: -1 , Application_Name: ksql+ | | | | | | | | | , Xact_Start: 2025-04-27 17:06:13.855348+08 , Query_Start: 2025-04-27 17:06:15.955823+08 , Xact_Elapse: +000000000 00:17:23.483185000 + | | | | | | | | | -------- + | | | | | | | | | 0Pid: 2380294 + | | | | | | | | | Lock_Granted: true , Mode: AccessShareLock , Username: system , Database: test , Client_Addr : NULL , Client_Port: -1 , Application_Name: ksql + | | | | | | | | | , Xact_Start: 2025-04-27 17:06:00.351627+08 , Query_Start: 2025-04-27 17:06:07.249354+08 , Xact_Elapse: +000000000 00:17:36.986906000 + | | | | | | | | | -------- + | | | | | | | | | 0Pid: 2380294 + | | | | | | | | | Lock_Granted: true , Mode: AccessShareLock , Username: system , Database: test , Client_Addr : NULL , Client_Port: -1 , Application_Name: ksql + | | | | | | | | | , Xact_Start: 2025-04-27 17:06:00.351627+08 , Query_Start: 2025-04-27 17:06:07.249354+08 , Xact_Elapse: +000000000 00:17:36.986906000 + | | | | | | | | | -------- + | | | | | | | | | 1Pid: 2423637 + | | | | | | | | | Lock_Granted: false , Mode: AccessShareLock , Username: system , Database: test , Client_Add r: NULL , Client_Port: -1 , Application_Name: ksql + | | | | | | | | | , Xact_Start: 2025-04-27 17:06:26.795951+08 , Query_Start: 2025-04-27 17:06:26.795951+08 , Xact_Elapse: +000000000 00:17:10.542582000 + | | | | | | | | | -------- + | | | | | | | | | (1 row)
预防措施
在迁移前充分测试锁行为
评估所有TRUNCATE操作的使用场景
培训团队了解PostgreSQL与Oracle在并发控制方面的差异
解决方案
在低峰期执行TRUNCATE操作
使用短事务,避免长事务与TRUNCATE冲突
通过设置较低的 lock_timeout 来优雅地处理 TRUNCATE 操作可能导致的阻塞问题。
DO $$
BEGIN
FOR i IN 1..10 LOOP -- 最多尝试10次
BEGIN
SET LOCAL lock_timeout = '100ms'; -- 设置很短的超时
TRUNCATE TABLE target_table;
EXIT; -- 成功则退出循环
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'Attempt % failed, retrying...', i;
PERFORM pg_sleep(0.1 * i); -- 指数退避
END;
END LOOP;
END $$;
-- 封装函数
CREATE OR REPLACE FUNCTION safe_truncate(table_name text, max_attempts int DEFAULT 5)
RETURNS void AS $$
DECLARE
attempt int := 1;
success boolean := false;
BEGIN
WHILE attempt <= max_attempts AND NOT success LOOP
BEGIN
EXECUTE format('SET LOCAL lock_timeout = %L',
CASE
WHEN attempt <= 2 THEN '50ms'
WHEN attempt <= 4 THEN '100ms'
ELSE '200ms'
END);
EXECUTE format('TRUNCATE TABLE %I', table_name);
success := true;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'Truncate attempt %/% failed on %: %',
attempt, max_attempts, table_name, SQLERRM;
PERFORM pg_sleep(0.2 * attempt); -- 指数退避
attempt := attempt + 1;
END;
END LOOP;
IF NOT success THEN
RAISE EXCEPTION 'Failed to truncate % after % attempts', table_name, max_attempts;
END IF;
END;
$$ LANGUAGE plpgsql;
-- 使用示例
SELECT safe_truncate('my_large_table');
Note:
也可以如果多次失败,回退到DELETE + VACUUM方案