都知道oracle的优化器要比其他数据库好,但有时需要一些证据,在从oracle向国产数据库迁移后,可能你才深有体会差距有多少。 前一篇记录了null in(…)子查询差异,在oracle到postgresql系国产改造项目上还有另一个问题是join view关连查询的差异。这篇我将简单演示。
先看Oracle
SQL> @cc pdb11
ALTER SESSION SET container = pdb11;
Session altered.
USERNAME INST_NAME HOST_NAME I# SID SERIAL# VERSION STARTED SPID OPID CPID SADDR PADDR
-------------------- -------------------- ------------------------- --- ----- -------- ---------- -------- ---------- ----- --------------- ---------------- ----------------
SYS PDB11-anbob db1 1 1227 3942 23.26.1.0. 20260304 2050169 90 2050168 00000000B8D5E420 00000000BA720388
SQL> create table tobj1 as select * from dba_objects;
Table created.
SQL> create table tobj2 as select * from dba_objects;
Table created.
SQL> create view vobj2 as select object_id,max(object_name) object_name,count(*) cnt from tobj2 group by object_id;
View created.
SQL> insert into tobj2 select * from tobj2;
74146 rows created.
SQL> r
1* insert into tobj2 select * from tobj2
148292 rows created.
SQL> commit;
Commit complete.
SQL> create index idx_tobj1_object_id on tobj1(object_id);
Index created.
SQL> create index idx_tobj2_object_id on tobj2(object_id);
Index created.
SQL> @gts tobj1
Gather Table Statistics for table tobj1...
PL/SQL procedure successfully completed.
SQL> @gts tobj2
Gather Table Statistics for table tobj2...
PL/SQL procedure successfully completed.
OK,构造了测试数据, 创建了对象 table tobj1, tobj2, view vobj2(基于tobj2)
测试场景1,条件为join 列
SQL> @ep
EXPLAIN PLAN FOR
2 select * from tobj1 left join vobj2 on tobj1.object_id=vobj2.object_id where tobj1.object_id=40;
Explained.
SQL> @x2
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
Plan hash value: 375014744
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 232 | 9 (0)| 00:00:01 |
| 1 | NESTED LOOPS OUTER | | 1 | 232 | 9 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED | TOBJ1 | 1 | 148 | 2 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IDX_TOBJ1_OBJECT_ID | 1 | | 1 (0)| 00:00:01 |
| 4 | VIEW PUSHED PREDICATE | VOBJ2 | 1 | 84 | 7 (0)| 00:00:01 |
| 5 | SORT GROUP BY | | 1 | 44 | 7 (0)| 00:00:01 |
|* 6 | FILTER | | | | | |
| 7 | TABLE ACCESS BY INDEX ROWID BATCHED| TOBJ2 | 4 | 176 | 7 (0)| 00:00:01 |
|* 8 | INDEX RANGE SCAN | IDX_TOBJ2_OBJECT_ID | 4 | | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("TOBJ1"."OBJECT_ID"=40)
6 - filter("TOBJ1"."OBJECT_ID"=40)
8 - access("OBJECT_ID"="TOBJ1"."OBJECT_ID")
22 rows selected.
Note: 外层的tobj1的条件传递到了view中。 可以在view 中先过滤,效率高。
测试场景2,条件为非join 列
SQL> @ep
EXPLAIN PLAN FOR
2 select * from tobj1 left join vobj2 on tobj1.object_id=vobj2.object_id where tobj1.object_name='tobj';
Explained.
SQL> @x2
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------
Plan hash value: 2421081633
--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 232 | 445 (1)| 00:00:01 |
| 1 | NESTED LOOPS OUTER | | 1 | 232 | 445 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | TOBJ1 | 1 | 148 | 438 (1)| 00:00:01 |
| 3 | VIEW PUSHED PREDICATE | VOBJ2 | 1 | 84 | 7 (0)| 00:00:01 |
| 4 | SORT GROUP BY | | 1 | 44 | 7 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| TOBJ2 | 4 | 176 | 7 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IDX_TOBJ2_OBJECT_ID | 4 | | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("TOBJ1"."OBJECT_NAME"='tobj')
6 - access("OBJECT_ID"="TOBJ1"."OBJECT_ID")
19 rows selected.
Note: 条件换成了非 join列, 一样可以看到谓词推进 ,效率高, 这里主要是因为oracle的优化器特性Join Predicate PushDown,有如下参数控制:
SQL> @pd jppd
Show all parameters and session values from x$ksppi/x$ksppcv...
NUM N_HEX NAME VALUE DESCRIPTION
---------- ---------- -------------------------------------------------------- ------------------------------ ---------------------------------------------------------------------------------------------------
5264 1490 _optimizer_extend_jppd_view_types TRUE join pred pushdown on group-by, distinct, semi-/anti-joined view
5485 156D _optimizer_interleave_jppd TRUE interleave join predicate pushdown during CBQT
5551 15AF _optimizer_try_st_before_jppd TRUE try Star Transformation before Join Predicate Push Down
-- 关闭JPPD
SQL> alter session set "_optimizer_extend_jppd_view_types"=false;
Session altered.
SQL> @ep
EXPLAIN PLAN FOR
2 select * from tobj1 left join vobj2 on tobj1.object_id=vobj2.object_id where tobj1.object_name='tobj';
Explained.
SQL> @x2
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------
Plan hash value: 226040556
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 232 | | 3889 (1)| 00:00:01 |
|* 1 | HASH JOIN OUTER | | 1 | 232 | | 3889 (1)| 00:00:01 |
| 2 | JOIN FILTER CREATE | :BF0000 | 1 | 148 | | 438 (1)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | TOBJ1 | 1 | 148 | | 438 (1)| 00:00:01 |
| 4 | VIEW | VOBJ2 | 75208 | 6169K| | 3451 (1)| 00:00:01 |
| 5 | HASH GROUP BY | | 75208 | 3231K| 14M| 3451 (1)| 00:00:01 |
| 6 | JOIN FILTER USE | :BF0000 | 296K| 12M| | 1738 (1)| 00:00:01 |
|* 7 | TABLE ACCESS FULL| TOBJ2 | 296K| 12M| | 1738 (1)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TOBJ1"."OBJECT_ID"="VOBJ2"."OBJECT_ID"(+))
3 - filter("TOBJ1"."OBJECT_NAME"='tobj')
7 - filter(SYS_OP_BLOOM_FILTER(:BF0000,"OBJECT_ID"))
21 rows selected.
Note: 禁用jppd特性后,使用了hash join , view里的查询要执行完, tobj2表使用了全表扫描,效率低。
再看PostgreSQL系(highgoDB)
create table tobj1 as select * from pg_class;
create table tobj2 as select * from pg_class;
insert into tobj2 select * from tobj2
-- 重复几次
create view vobj2 as SELECT tobj2.id,
max(tobj2.relname::text) AS name,
count(*) AS cnt
FROM tobj2
GROUP BY tobj2.id;
create index on tobj1(oid);
create index on tobj2(oid);
vacuum tobj1;
vacuum tobj2;
构建测试数据来源pg_class, 当然tobj2的数据越多,相差越明显。
测试场景1,条件为join 列
highgo=# explain analyze select * from tobj1 left join vobj2 on tobj1.oid=vobj2.id where tobj1.oid=1247;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=34.20..9097.43 rows=663 width=309) (actual time=6.834..6.838 rows=1 loops=1)
Join Filter: (tobj1.oid = tobj2.id)
-> Seq Scan on tobj1 (cost=0.00..1.12 rows=1 width=265) (actual time=0.011..0.013 rows=1 loops=1)
Filter: (oid = '1247'::oid)
Rows Removed by Filter: 9
-> GroupAggregate (cost=34.20..9089.65 rows=663 width=44) (actual time=6.814..6.814 rows=1 loops=1)
Group Key: tobj2.id
-> Bitmap Heap Scan on tobj2 (cost=34.20..9054.92 rows=2810 width=68) (actual time=0.830..5.614 rows=2816 loops=1)
Recheck Cond: (id = '1247'::oid)
Heap Blocks: exact=2816
-> Bitmap Index Scan on tobj2_oid_idx (cost=0.00..33.50 rows=2810 width=0) (actual time=0.419..0.419 rows=2816 loops=1)
Index Cond: (id = '1247'::oid)
Planning Time: 0.449 ms
Execution Time: 6.904 ms
(14 rows)
Note: 条件为join 关连列时,谓词条件可以推到view里。效率高。
测试场景1,条件为非join 列
highgo=# set max_parallel_workers=1;
SET
highgo=# explain analyze select * from tobj1 left join vobj2 on tobj1.oid=vobj2.id where tobj1.relname='pg_type';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Left Join (cost=65475.08..65657.40 rows=3 width=309) (actual time=736.336..749.298 rows=1 loops=1)
Merge Cond: (tobj1.oid = tobj2.id)
-> Sort (cost=1.14..1.14 rows=1 width=265) (actual time=0.040..0.041 rows=1 loops=1)
Sort Key: tobj1.oid
Sort Method: quicksort Memory: 25kB
-> Seq Scan on tobj1 (cost=0.00..1.12 rows=1 width=265) (actual time=0.029..0.031 rows=1 loops=1)
Filter: (relname = 'pg_type'::name)
Rows Removed by Filter: 9
-> Finalize GroupAggregate (cost=65473.94..65647.81 rows=673 width=44) (actual time=736.260..749.246 rows=14 loops=1)
Group Key: tobj2.id
-> Gather Merge (cost=65473.94..65630.99 rows=1346 width=44) (actual time=736.246..749.220 rows=29 loops=1)
Workers Planned: 2
Workers Launched: 1
-> Sort (cost=64473.92..64475.60 rows=673 width=44) (actual time=732.606..732.614 rows=118 loops=2)
Sort Key: tobj2.id
Sort Method: quicksort Memory: 78kB
Worker 0: Sort Method: quicksort Memory: 78kB
-> Partial HashAggregate (cost=64435.58..64442.31 rows=673 width=44) (actual time=732.228..732.359 rows=673 loops=2)
Group Key: tobj2.id
Batches: 1 Memory Usage: 169kB
Worker 0: Batches: 1 Memory Usage: 169kB
-> Parallel Seq Scan on tobj2 (cost=0.00..56538.29 rows=789729 width=68) (actual time=0.011..130.570 rows=947584 loops=2)
Planning Time: 0.423 ms
Execution Time: 749.463 ms
(24 rows)
Note: 当条件谓词为非join 列时, 外层的条件无法传递给内部, Postgresql选择了view 全部执行完,全扫tobj2表。如同oracle关掉了jppd优化。
为什么Oracle比PostgreSQL快?
Oracle比PostgreSQL快的核心差异就是当谓词非Join列时无法谓词推进,这就是 Oracle 的:Join Predicate Pushdown(JPPD)
可以tobj1.name -> tobj1 –> tobj2.id值推导。
Postgresql不能下推,这是 PostgreSQL 当前优化器能力限制。pg只能传播“等值谓词”, 如tobj1.oid->tobj1->view.oid->tobj2.oid .
而view里如果有union all里,oracle可以推到view中每个union 分支都带上join的条件,而pg会UNION ALL 全分支全跑
因为 Oracle 优化器有:Query Transformation Engine
它会做:
- View Merge
把 view 拆开。 - Predicate Pushdown
把外层条件推进 view。 - Join Predicate Pushdown(关键)
即使:
WHERE 条件
不直接作用于 view。
而且oracle连left join都敢推进,它能证明NULL-preserving semantics不存在,转换后语义等价,而pg不敢。
| 能力 | Oracle | PostgreSQL |
| View Merge | 强 | 一般 |
| Predicate Pushdown | 强 | 一般 |
| Join Predicate Pushdown | 很强 | 很弱 |
| UNION ALL Predicate Pruning | 强 | 有限 |
| LEFT JOIN Pushdown | 可以 | 保守 |
小结
当有使用join view的查询时,在oracle可以使用JPPD特性做谓词推进,但是在oracle中谓词推进的条件有限,仅能是join 关连列的等值传到,当使用非关连列时,在postgresql中无法推进到view中,导致view的整个查询都要执行完,影响查询性能。如果使用基于postgresql的国产库时,如kingbase,highgo, 大云海山等时,注意性能差异。