前面我测试了oracle迁移到postgresql的两个性能问题,not in时不能转换为Antijoin, 和join view的谓词不是关连列时存在无法谓词推进,下面测试一下在另外2款集中式数据库达梦和崖山的表现。
达梦
构建测试数据
Server[LOCALHOST:5236]:mode is normal, state is open
login used time : 7.366(ms)
disql V8
SQL>
SQL> create table tobj1 as select * from dba_objects;
executed successfully
used time: 66.824(ms). Execute id is 60105.
SQL> create table tobj2 as select * from dba_objects;
executed successfully
used time: 63.160(ms). Execute id is 60106.
SQL> insert into tobj2 select * from tobj2;
affect rows 1015
used time: 6.149(ms). Execute id is 60107.
SQL> /
affect rows 2030
used time: 9.378(ms). Execute id is 60108.
SQL> /
affect rows 4060
...
SQL> select count(*) from tobj2;
LINEID COUNT(*)
---------- --------------------
1 2078720
Not in场景
SQL> select count(*) from tobj1 where object_id not in(select object_id from tobj2 where object_id is not null) ;
LINEID COUNT(*)
---------- --------------------
1 0
used time: 439.006(ms). Execute id is 60120.
SQL> explain select count(*) from tobj1 where object_id not in(select object_id from tobj2 where object_id is not null) ;
1 #NSET2: [384, 1, 30]
2 #PRJT2: [384, 1, 30]; exp_num(1), is_atom(FALSE)
3 #AAGR2: [384, 1, 30]; grp_num(0), sfun_num(1), distinct_flag[0]; slave_empty(0)
4 #HASH LEFT SEMI JOIN2: [384, 1, 30]; (ANTI),KEY_NUM(1); KEY(TOBJ1.OBJECT_ID=DMTEMPVIEW_889214295.colname) KEY_NULL_EQU(0)
5 #CSCN2: [1, 1013, 30]; INDEX33555549(TOBJ1)
6 #PRJT2: [250, 1974784, 30]; exp_num(1), is_atom(FALSE)
7 #SLCT2: [250, 1974784, 30]; NOT(TOBJ2.OBJECT_ID IS NULL)
8 #CSCN2: [250, 2078720, 30]; INDEX33555550(TOBJ2)
used time: 1.432(ms). Execute id is 0.
SQL> select count(*) from tobj1 where not exists(select 1 from tobj2 where tobj1.object_id=tobj2.object_id);
LINEID COUNT(*)
---------- --------------------
1 0
used time: 382.377(ms). Execute id is 60121.
SQL> explain select count(*) from tobj1 where not exists(select 1 from tobj2 where tobj1.object_id=tobj2.object_id);
1 #NSET2: [371, 1, 42]
2 #PRJT2: [371, 1, 42]; exp_num(1), is_atom(FALSE)
3 #AAGR2: [371, 1, 42]; grp_num(0), sfun_num(1), distinct_flag[0]; slave_empty(0)
4 #HASH LEFT SEMI JOIN2: [371, 1, 42]; (ANTI),KEY_NUM(1); KEY(TOBJ1.OBJECT_ID=TOBJ2.OBJECT_ID) KEY_NULL_EQU(0)
5 #CSCN2: [1, 1013, 42]; INDEX33555549(TOBJ1)
6 #CSCN2: [230, 2078720, 30]; INDEX33555550(TOBJ2)
Note: 在达梦中not in 和not exists一样都使用了 ANTI JOIN,而非逐行执行子查询. 便是注意使用not in时子查询是先被完整的物化再与外层匹配。 而not exists使用了更优的反连接(ANTI JOIN)优化,并且避免了子查询全物化。虽然两个计划最终都是 HASH ANTI JOIN,但 NOT EXISTS 版本省去了 PRJT2 和 SLCT2 两层操作,减少了数据复制和内存开销。
| 对比项 | NOT IN 版本 | NOT EXISTS 版本 |
|---|---|---|
| 子查询处理 | 先全扫描 + 过滤 NULL,生成中间结果(第6-8行有 PRJT2 + SLCT2) | 直接扫描,无额外投影层 |
| 中间结果集 | 需要物化投影后的 197 万行 | 无需物化,直接参与 HASH 连接 |
| 执行计划层级 | 3 层(子查询投影 → 过滤 → 反连接) | 2 层(扫描 → 反连接) |
| 性能 | 差(大量内存/临时表消耗) | 好(流式处理) |
NOT EXISTS 而非 NOT IN,即使 NOT IN 写了 IS NOT NULLJOIN VIEW 场景
SQL> create view vobj2 as select object_id,max(object_name) object_name,count(*) cnt from tobj2 group by object_id;
executed successfully
used time: 17.226(ms). Execute id is 60122.
SQL> EXPLAIN select * from tobj1 left join vobj2 on tobj1.object_id=vobj2.object_id where tobj1.object_id=40;
1 #NSET2: [128510, 13160, 675]
2 #PRJT2: [128510, 13160, 675]; exp_num(18), is_atom(FALSE)
3 #NEST LOOP LEFT JOIN2: [128510, 13160, 675]; partition_keys_num(0) ret_null(0)
4 #SLCT2: [1, 25, 597]; TOBJ1.OBJECT_ID = var1
5 #CSCN2: [1, 1013, 597]; INDEX33555549(TOBJ1)
6 #SLCT2: [404, 519, 78]; VOBJ2.OBJECT_ID = var2
7 #PRJT2: [404, 20787, 78]; exp_num(3), is_atom(FALSE)
8 #HAGR2: [404, 20787, 78]; grp_num(1), sfun_num(2), distinct_flag[0,0]; slave_empty(0) keys(TOBJ2.OBJECT_ID)
9 #CSCN2: [251, 2078720, 78]; INDEX33555550(TOBJ2)
used time: 1.844(ms). Execute id is 0.
SQL> create index idx_tobj1_oid on tobj1(object_id);
executed successfully
used time: 449.781(ms). Execute id is 60123.
SQL> create index idx_tobj2_oid on tobj2(object_id);
executed successfully
used time: 00:00:06.532. Execute id is 60124.
SQL> EXPLAIN select * from tobj1 left join vobj2 on tobj1.object_id=vobj2.object_id where tobj1.object_id=40;
1 #NSET2: [128507, 13160, 675]
2 #PRJT2: [128507, 13160, 675]; exp_num(18), is_atom(FALSE)
3 #NEST LOOP LEFT JOIN2: [128507, 13160, 675]; partition_keys_num(0) ret_null(0)
4 #BLKUP2: [1, 25, 597]; IDX_TOBJ1_OID(TOBJ1)
5 #SSEK2: [1, 25, 597]; scan_type(ASC), IDX_TOBJ1_OID(TOBJ1), scan_range[exp_cast(40),exp_cast(40)]
6 #SLCT2: [404, 519, 78]; VOBJ2.OBJECT_ID = var1
7 #PRJT2: [404, 20787, 78]; exp_num(3), is_atom(FALSE)
8 #HAGR2: [404, 20787, 78]; grp_num(1), sfun_num(2), distinct_flag[0,0]; slave_empty(0) keys(TOBJ2.OBJECT_ID)
9 #CSCN2: [251, 2078720, 78]; INDEX33555550(TOBJ2)
used time: 2.541(ms). Execute id is 0.
使用Join 条件谓词时,在达梦8时发生了视图谓词推进(View Predicate Pushdown)。
- 第 6 行:
#SLCT2: [404, 519, 78]; VOBJ2.OBJECT_ID = var1- 这说明在扫描视图
VOBJ2时,已经应用了一个过滤条件VOBJ2.OBJECT_ID = var1。 - 这里的
var1是从左表TOBJ1传来的绑定变量值(具体为40)。
- 这说明在扫描视图
吐槽,执行计划关键字真难看。
SQL> EXPLAIN select * from tobj1 left join vobj2 on tobj1.object_id=vobj2.object_id where tobj1.object_name='tobj';
1 #NSET2: [406, 75, 675]
2 #PRJT2: [406, 75, 675]; exp_num(18), is_atom(FALSE)
3 #HASH LEFT JOIN2: [406, 75, 675]; key_num(1), partition_keys_num(0), ret_null(0), mix(0) KEY(TOBJ1.OBJECT_ID=VOBJ2.OBJECT_ID)
4 #SLCT2: [1, 25, 597]; TOBJ1.OBJECT_NAME = 'tobj'
5 #CSCN2: [1, 1013, 597]; INDEX33555549(TOBJ1)
6 #PRJT2: [404, 20787, 78]; exp_num(3), is_atom(FALSE)
7 #HAGR2: [404, 20787, 78]; grp_num(1), sfun_num(2), distinct_flag[0,0]; slave_empty(0) keys(TOBJ2.OBJECT_ID)
8 #CSCN2: [251, 2078720, 78]; INDEX33555550(TOBJ2)
used time: 2.093(ms). Execute id is 0.
NOte: 在使用 非join 列谓词时,并没有发生视图谓词推进
- 第4-5行: 从
TOBJ1中筛选object_name='tobj'的记录(扫描索引INDEX33555549)。索引组织表机构,使用了内部的索引。 - 第6-8行:独立、完整地执行视图
vobj2:- 全表扫描
TOBJ2(预估 2,078,720 行,见第8行) - 进行哈希聚合
HAGR2(按OBJECT_ID分组) - 生成完整的视图结果集
- 全表扫描
| 特征 | 使用join列谓词 | 使用 非join列谓词 |
|---|---|---|
| 连接方式 | NEST LOOP LEFT JOIN | HASH LEFT JOIN |
| 视图侧过滤 | #SLCT2: VOBJ2.OBJECT_ID = var1 (有参数传递) | 无此条件 |
| 视图扫描 | 虽然有CSCN2,但外层传入了绑定变量 | 视图独立全扫,无外部条件 |
崖山数据库
构建测试数据
Connected to:
YashanDB Server Enterprise Edition Release 23.5.1.100 x86_64 - Linux
SQL>
SQL> create table tobj1 as select * from dba_objects;
Succeed.
SQL> select count(*) from tobj1;
COUNT(*)
---------------------
2333
1 row fetched.
SQL> create table tobj2 as select * from dba_objects;
Succeed.
SQL> insert into tobj2 select * from tobj2;
2334 rows affected.
SQL> /
4668 rows affected.
...
SQL> select count(*) from tobj2;
COUNT(*)
---------------------
1195008
not in 场景
SQL> explain select count(*) from tobj1 where object_id not in(select object_id from tobj2 where object_id is not null) ;
PLAN_DESCRIPTION
----------------------------------------------------------------
SQL hash value: 72379594
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| Id | Operation type | Name | Owner | Rows | Cost(%CPU) | Partition info |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| 0 | SELECT STATEMENT | | | | | |
| 1 | AGGREGATE | | | 1| 280( 0)| |
|* 2 | HASH JOIN ANTI NA | | | 99900| 279( 0)| |
| 3 | TABLE ACCESS FULL | TOBJ1 | SYS | 100000| 136( 0)| |
| 4 | HASH GROUP | | | 1000| 141( 0)| |
|* 5 | TABLE ACCESS FULL | TOBJ2 | SYS | 96000| 136( 0)| |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
Operation Information (identified by operation id):
---------------------------------------------------
2 - Predicate : access("TOBJ1"."OBJECT_ID" = "TOBJ2"."OBJECT_ID")
4 - Group Expression: ("TOBJ2"."OBJECT_ID")
5 - Predicate : filter("TOBJ2"."OBJECT_ID" IS NOT NULL)
SQL> explain select count(*) from tobj1 where not exists(select 1 from tobj2 where tobj1.object_id=tobj2.object_id);
PLAN_DESCRIPTION
----------------------------------------------------------------
SQL hash value: 1231899452
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| Id | Operation type | Name | Owner | Rows | Cost(%CPU) | Partition info |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| 0 | SELECT STATEMENT | | | | | |
| 1 | AGGREGATE | | | 1| 280( 0)| |
|* 2 | HASH JOIN ANTI | | | 99900| 279( 0)| |
| 3 | TABLE ACCESS FULL | TOBJ1 | SYS | 100000| 136( 0)| |
| 4 | HASH GROUP | | | 1000| 141( 0)| |
| 5 | TABLE ACCESS FULL | TOBJ2 | SYS | 100000| 136( 0)| |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
Operation Information (identified by operation id):
---------------------------------------------------
2 - Predicate : access("TOBJ1"."OBJECT_ID" = "TOBJ2"."OBJECT_ID")
4 - Group Expression: ("TOBJ2"."OBJECT_ID")
19 rows fetched.
Note: 这执行计划看的很熟悉,和oracle一样使用了hash join anti join。
JOIN VIEW 场景
SQL> create view vobj2 as select object_id,max(object_name) object_name,count(*) cnt from tobj2 group by object_id;
SQL> create index idx_tobj1_oid on tobj1(object_id);
Succeed.
SQL> create index idx_tobj2_oid on tobj2(object_id);
Succeed.
SQL> explain select * from tobj1 left join vobj2 on tobj1.object_id=vobj2.object_id where tobj1.object_id=40;
PLAN_DESCRIPTION
----------------------------------------------------------------
SQL hash value: 3480004066
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| Id | Operation type | Name | Owner | Rows | Cost(%CPU) | Partition info |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| 0 | SELECT STATEMENT | | | | | |
|* 1 | HASH JOIN LEFT OUTER | | | 101| 1( 0)| |
| 2 | TABLE ACCESS BY INDEX ROWID | TOBJ1 | SYS | 101| 1( 0)| |
|* 3 | INDEX RANGE SCAN | IDX_TOBJ1_OID | SYS | 101| 1( 0)| |
| 4 | VIEW | | | 101| 1( 0)| |
| 5 | GROUP | | | 101| 1( 0)| |
| 6 | TABLE ACCESS BY INDEX ROWID| TOBJ2 | SYS | 101| 1( 0)| |
|* 7 | INDEX RANGE SCAN | IDX_TOBJ2_OID | SYS | 101| 1( 0)| |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
Operation Information (identified by operation id):
---------------------------------------------------
1 - Predicate : access("TOBJ1"."OBJECT_ID" = "TOBJ2"."OBJECT_ID")
3 - Predicate : access("TOBJ1"."OBJECT_ID" = 40)
5 - Group Expression: (40)
7 - Predicate : access("TOBJ2"."OBJECT_ID" = 40)
23 rows fetched.
Note: 在使用join 列谓词时,可以做到view的谓词推进,#7能看到。
SQL> explain select * from tobj1 left join vobj2 on tobj1.object_id=vobj2.object_id where tobj1.object_name='tobj';
PLAN_DESCRIPTION
----------------------------------------------------------------
SQL hash value: 1184278721
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| Id | Operation type | Name | Owner | Rows | Cost(%CPU) | Partition info |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
| 0 | SELECT STATEMENT | | | | | |
|* 1 | HASH JOIN RIGHT OUTER | | | 101| 293( 0)| |
| 2 | JOIN FILTER USE | | | 1000| 150( 0)| |
| 3 | VIEW | | | 1000| 150( 0)| |
| 4 | HASH GROUP | | | 1000| 150( 0)| |
|* 5 | TABLE ACCESS FULL | TOBJ2 | SYS | 100000| 136( 0)| |
|* 6 | JOIN FILTER CREATE | | | 101| 141( 0)| |
|* 7 | TABLE ACCESS FULL | TOBJ1 | SYS | 101| 141( 0)| |
+----+--------------------------------+----------------------+------------+----------+-------------+--------------------------------+
Operation Information (identified by operation id):
---------------------------------------------------
1 - Predicate : access("TOBJ2"."OBJECT_ID" = "TOBJ1"."OBJECT_ID")
4 - Group Expression: ("TOBJ2"."OBJECT_ID")
5 - Predicate : RUNTIME FILTER(RUNTIME USE(0): "TOBJ2"."OBJECT_ID")
6 - Predicate : RUNTIME FILTER(RUNTIME CREATE(0): "TOBJ1"."OBJECT_ID")
7 - Predicate : filter("TOBJ1"."OBJECT_NAME" = 'tobj')
Note: 崖山的效果更好,对于非Join列谓词时,崖山数据库在这里使用了运行时过滤(Runtime Filter)优化,本质上实现了动态谓词推进,避免了视图的完全物化。
执行顺序(从下往上):
1. 第7行:扫描 TOBJ1,过滤 object_name='tobj' → 预估 101 行
2. 第6行:JOIN FILTER CREATE - 用这 101 行的 OBJECT_ID 动态创建一个过滤布隆过滤器
3. 第2行:JOIN FILTER USE - 将过滤器下推到 TOBJ2 的扫描中
4. 第5行:扫描 TOBJ2 时,用过滤器提前过滤掉不符合条件的行
5. 第4行:对过滤后的结果做 GROUP BY
6. 第1行:HASH JOIN 完成最终连接
小结
对于在postgresql中的这2个问题,首先对于not in 这2款数据库都可以转换为anti join, 只是达梦not in时有一层全雾化的过程,还是建议用not exists代替not in. 其次对于join view 在使用join列谓词时都可以推进。而使用非join列谓词时,达梦未能推进,而yashan使用了一种运行时过滤技术,动态推进。