之前写过一篇《Oracle数据库中 Scalar-subquery 缓存和 DETERMINISTIC Function》记录了在oracle中标量子查询时,如果子表的相同的关连值时,可以利用cache,减少运行时的执行调用,PostgreSQL 在V14(14.4) 引入了Memoize 新特性也是类似的优化,在Opengauss系目前还不支持,这种测试在Gauss会查询转换为left out join,最近测试了两天的YashanDB是否支持SS cache,因为没有monitor hint和starts列,比较费时。简单记录。
创建测试数据表
drop table test_ec1;
drop table test_ec2;
create table test_ec1 as select * from dba_objects;
create table test_ec2 as select * from test_ec1;
BEGIN
for i in 1..10000 LOOP
BEGIN
insert into test_ec2 select * from test_ec1;
end;
end loop;
end;
/
create index idx_ec1_id on test_ec1(object_id);
SQL> select count(*) from test_ec1;
COUNT(*)
---------------------
2349
SQL> select count(*) from test_ec2;
COUNT(*)
---------------------
25864839
SQL> select owner,count(*) from test_ec2 group by owner;
OWNER COUNT(*)
---------------------------------------------------------------- ---------------------
SYS 16703687
MDSYS 88088
XA_SYS 33033
ANBOB 11011
PUBLIC 9029020
测试标量子查询
SQL> select * from v$version;
BANNER VERSION_NUMBER
---------------------------------------------------------------- -----------------
Enterprise Edition Release 23.5.1.100 x86_64 23.5.1.100
SQL> show parameter cache
NAME VALUE
---------------------------------------------------------------- ----------------------------------------------------------------
SCOL_CACHEABLE_SCAN_ROWS 18446744073709551615
SCOL_DISK_CACHEABLE_SCAN_ROWS 18446744073709551615
LSC_DICTIONARY_CACHE_AUTOEXTEND FALSE
SCOL_WRITE_CACHE_POLICY WRITE_BACK
ENABLE_DISKCACHE FALSE
DISKCACHE_ROOT ?/cache/diskcache
DISKCACHE_CAPACITY 100M
SQL> select object_id,owner from test_ec1 where rownum<=10;
OBJECT_ID OWNER
--------------------- ----------------------------------------------------------------
0 SYS
1 SYS
2 SYS
3 SYS
4 SYS
5 SYS
6 SYS
7 SYS
8 SYS
9 SYS
SQL> set autotrace trace exp
SQL> alter session set STATISTICS_LEVEL=all;
SQL> select object_id,(select count(*) from test_ec2 e2 where e1.owner=e2.owner ) cnt from test_ec1 e1 where object_id in(0,1);
Execution Plan
----------------------------------------------------------------
SQL hash value: 1123266897
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| Id | Operation type | Name | Owner | E - Rows | A - Rows | Cost(%CPU) | A - Time | Loops | Memory | Disk | Partition info |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| 0 | SELECT STATEMENT | | | | 2| | 43| 3| 0| 0| |
| 1 | SUBQUERY | QUERY[1] | | | 2| | 67198562| 4| 0| 0| |
|* 2 | TABLE ACCESS FULL (AGGR PUSHED)| TEST_EC2 | SYS | 1| 2| 147( 0)| 67198551| 4| 0| 0| |
| 3 | TABLE ACCESS BY INDEX ROWID | TEST_EC1 | SYS | 201| | 1( 0)| | | | | |
|* 4 | INDEX RANGE SCAN | IDX_EC1_ID | SYS | 201| 2| 1( 0)| 41| 3| 0| 0| |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
Operation Information (identified by operation id):
---------------------------------------------------
2 - Predicate : filter("E1"."OWNER"[OPTMZ-3][OPTMZ-2] = "E2"."OWNER"[OPTMZ-1])
4 - Predicate : access("E1"."OBJECT_ID" IN (0[OPTMZ-0], 1[OPTMZ-0]))
Note: yashandb 没有starts ,loop 列的值不明白是什么意思,下面测试loops 区别
SQL> select object_id,owner from test_ec1 e1 where object_id in(0);
Execution Plan
----------------------------------------------------------------
SQL hash value: 592253244
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| Id | Operation type | Name | Owner | E - Rows | A - Rows | Cost(%CPU) | A - Time | Loops | Memory | Disk | Partition info |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| 0 | SELECT STATEMENT | | | | 1| | 27| 2| 0| 0| |
| 1 | TABLE ACCESS BY INDEX ROWID | TEST_EC1 | SYS | 101| | 1( 0)| | | | | |
|* 2 | INDEX RANGE SCAN | IDX_EC1_ID | SYS | 101| 1| 1( 0)| 26| 2| 0| 0| |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
SQL>
select object_id,owner from test_ec1 e1 where object_id in(0,1);
Execution Plan
----------------------------------------------------------------
SQL hash value: 175903804
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| Id | Operation type | Name | Owner | E - Rows | A - Rows | Cost(%CPU) | A - Time | Loops | Memory | Disk | Partition info |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| 0 | SELECT STATEMENT | | | | 2| | 25| 3| 0| 0| |
| 1 | TABLE ACCESS BY INDEX ROWID | TEST_EC1 | SYS | 201| | 1( 0)| | | | | |
|* 2 | INDEX RANGE SCAN | IDX_EC1_ID | SYS | 201| 2| 1( 0)| 24| 3| 0| 0| |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
Note: 似乎增加一个值loops +1,既然没有执行次数的值,下面我们使用执行时间判断是否有使用cache。
SQL> set timing on
SQL> select object_id,(select count(*) from test_ec2 e2 where e1.owner=e2.owner ) cnt from test_ec1 e1
where object_id in(0);
Execution Plan
----------------------------------------------------------------
SQL hash value: 4252418592
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| Id | Operation type | Name | Owner | E - Rows | A - Rows | Cost(%CPU) | A - Time | Loops | Memory | Disk | Partition info |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| 0 | SELECT STATEMENT | | | | 1| | 30| 2| 0| 0| |
| 1 | SUBQUERY | QUERY[1] | | | 1| | 33863621| 2| 0| 0| |
|* 2 | TABLE ACCESS FULL (AGGR PUSHED)| TEST_EC2 | SYS | 1| 1| 147( 0)| 33863616| 2| 0| 0| |
| 3 | TABLE ACCESS BY INDEX ROWID | TEST_EC1 | SYS | 101| | 1( 0)| | | | | |
|* 4 | INDEX RANGE SCAN | IDX_EC1_ID | SYS | 101| 1| 1( 0)| 28| 2| 0| 0| |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
Operation Information (identified by operation id):
---------------------------------------------------
2 - Predicate : filter("E1"."OWNER"[OPTMZ-3][OPTMZ-2] = "E2"."OWNER"[OPTMZ-1])
4 - Predicate : access("E1"."OBJECT_ID" = 0[OPTMZ-0])
Elapsed: 00:00:33.873
SQL> select object_id,(select count(*) from test_ec2 e2 where e1.owner=e2.owner ) cnt from test_ec1 e1
where object_id in(0,1);
Execution Plan
----------------------------------------------------------------
SQL hash value: 1123266897
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| Id | Operation type | Name | Owner | E - Rows | A - Rows | Cost(%CPU) | A - Time | Loops | Memory | Disk | Partition info |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| 0 | SELECT STATEMENT | | | | 2| | 43| 3| 0| 0| |
| 1 | SUBQUERY | QUERY[1] | | | 2| | 67198562| 4| 0| 0| |
|* 2 | TABLE ACCESS FULL (AGGR PUSHED)| TEST_EC2 | SYS | 1| 2| 147( 0)| 67198551| 4| 0| 0| |
| 3 | TABLE ACCESS BY INDEX ROWID | TEST_EC1 | SYS | 201| | 1( 0)| | | | | |
|* 4 | INDEX RANGE SCAN | IDX_EC1_ID | SYS | 201| 2| 1( 0)| 41| 3| 0| 0| |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
Operation Information (identified by operation id):
---------------------------------------------------
2 - Predicate : filter("E1"."OWNER"[OPTMZ-3][OPTMZ-2] = "E2"."OWNER"[OPTMZ-1])
4 - Predicate : access("E1"."OBJECT_ID" IN (0[OPTMZ-0], 1[OPTMZ-0]))
Elapsed: 00:01:07.208
SQL> select object_id,(select count(*) from test_ec2 e2 where e1.owner=e2.owner ) cnt from test_ec1 e1
where object_id in(0,1,2);
Execution Plan
----------------------------------------------------------------
SQL hash value: 1807856721
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| Id | Operation type | Name | Owner | E - Rows | A - Rows | Cost(%CPU) | A - Time | Loops | Memory | Disk | Partition info |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| 0 | SELECT STATEMENT | | | | 3| | 62| 4| 0| 0| |
| 1 | SUBQUERY | QUERY[1] | | | 3| | 100824329| 6| 0| 0| |
|* 2 | TABLE ACCESS FULL (AGGR PUSHED)| TEST_EC2 | SYS | 1| 3| 147( 0)| 100824313| 6| 0| 0| |
| 3 | TABLE ACCESS BY INDEX ROWID | TEST_EC1 | SYS | 300| | 1( 0)| | | | | |
|* 4 | INDEX RANGE SCAN | IDX_EC1_ID | SYS | 300| 3| 1( 0)| 61| 4| 0| 0| |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
Operation Information (identified by operation id):
---------------------------------------------------
2 - Predicate : filter("E1"."OWNER"[OPTMZ-3][OPTMZ-2] = "E2"."OWNER"[OPTMZ-1])
4 - Predicate : access("E1"."OBJECT_ID" IN (0[OPTMZ-0], 1[OPTMZ-0], 2[OPTMZ-0]))
Elapsed: 00:01:40.835
Note: 执行计划是使用了标量子查询,和带了”AGGR PUSHED”, 这是YASHANDB 对于子查询有聚合参数的一种优化方式。
但是从传入的值 0 1 2对应的都是SYS, 如果有cache 那三次执行应该接近,但是目前看恰恰成倍的增长,一值33.8s 二值67.2s 三值100.8s。 这难道是崖山不支持 标量子查询cache? 不是的,原因是”AGGR PUSHED”和cache 优化 不兼容。
目前没有参数和直接hint 可以禁用这个执行计划优化,这是基于cost代码模型评估,如果想干预的话可以使用selectivity hint. 下面继续。
SQL> select object_id,(select count(*) from test_ec2 e2 where e1.owner=e2.owner SELECTIVITY 0.0001 ) cnt from test_ec1 e1 where object_id in(0);
Execution Plan
----------------------------------------------------------------
SQL hash value: 2611185001
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| Id | Operation type | Name | Owner | E - Rows | A - Rows | Cost(%CPU) | A - Time | Loops | Memory | Disk | Partition info |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| 0 | SELECT STATEMENT | | | | 1| | 34| 2| 0| 0| |
| 1 | SUBQUERY | QUERY[1] | | | 1| | 35837090| 2| 0| 0| |
| 2 | AGGREGATE | | | 1| 1| 1076561( 0)| 35837087| 2| 0| 0| |
|* 3 | TABLE ACCESS FULL | TEST_EC2 | SYS | 2587| 16703687| 1076561( 0)| 34854974| 16703688| 0| 0| |
| 4 | TABLE ACCESS BY INDEX ROWID | TEST_EC1 | SYS | 1| | 1( 0)| | | | | |
|* 5 | INDEX RANGE SCAN | IDX_EC1_ID | SYS | 1| 1| 1( 0)| 33| 2| 0| 0| |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
Operation Information (identified by operation id):
---------------------------------------------------
3 - Predicate : filter("E1"."OWNER"[OPTMZ-3][OPTMZ-2] = "E2"."OWNER"[OPTMZ-1])
5 - Predicate : access("E1"."OBJECT_ID" = 0[OPTMZ-0])
Elapsed: 00:00:35.849
SQL> select object_id,(select count(*) from test_ec2 e2 where e1.owner=e2.owner SELECTIVITY 0.0001 ) cnt from test_ec1 e1
where object_id in(0,1);
Execution Plan
----------------------------------------------------------------
SQL hash value: 2609558636
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| Id | Operation type | Name | Owner | E - Rows | A - Rows | Cost(%CPU) | A - Time | Loops | Memory | Disk | Partition info |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| 0 | SELECT STATEMENT | | | | 2| | 41| 3| 0| 0| |
| 1 | SUBQUERY | QUERY[1] | | | 1| | 40668578| 2| 0| 0| |
| 2 | AGGREGATE | | | 1| 1| 1076561( 0)| 40668572| 2| 0| 0| |
|* 3 | TABLE ACCESS FULL | TEST_EC2 | SYS | 2587| 16703687| 1076561( 0)| 39687209| 16703688| 0| 0| |
| 4 | TABLE ACCESS BY INDEX ROWID | TEST_EC1 | SYS | 2| | 1( 0)| | | | | |
|* 5 | INDEX RANGE SCAN | IDX_EC1_ID | SYS | 2| 2| 1( 0)| 41| 3| 0| 0| |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
Operation Information (identified by operation id):
---------------------------------------------------
1 - Subquery NDV info - NDV percentage: 0.500000, NDV Expression: ("E1"."OWNER"[OPTMZ-3])
3 - Predicate : filter("E1"."OWNER"[OPTMZ-3][OPTMZ-2] = "E2"."OWNER"[OPTMZ-1])
5 - Predicate : access("E1"."OBJECT_ID" IN (0[OPTMZ-0], 1[OPTMZ-0]))
Elapsed: 00:00:40.679
SQL> select object_id,(select count(*) from test_ec2 e2 where e1.owner=e2.owner SELECTIVITY 0.0001 ) cnt from test_ec1 e1
where object_id in(0,1,2);
Execution Plan
----------------------------------------------------------------
SQL hash value: 1496909357
Optimizer: ADOPT_C
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| Id | Operation type | Name | Owner | E - Rows | A - Rows | Cost(%CPU) | A - Time | Loops | Memory | Disk | Partition info |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
| 0 | SELECT STATEMENT | | | | 3| | 50| 4| 0| 0| |
| 1 | SUBQUERY | QUERY[1] | | | 1| | 40908022| 2| 0| 0| |
| 2 | AGGREGATE | | | 1| 1| 1076561( 0)| 40908018| 2| 0| 0| |
|* 3 | TABLE ACCESS FULL | TEST_EC2 | SYS | 2587| 16703687| 1076561( 0)| 39923911| 16703688| 0| 0| |
| 4 | TABLE ACCESS BY INDEX ROWID | TEST_EC1 | SYS | 3| | 1( 0)| | | | | |
|* 5 | INDEX RANGE SCAN | IDX_EC1_ID | SYS | 3| 3| 1( 0)| 47| 4| 0| 0| |
+----+--------------------------------+----------------------+------------+----------+----------+-------------+----------+----------+----------+----------+--------------------------------+
Operation Information (identified by operation id):
---------------------------------------------------
1 - Subquery NDV info - NDV percentage: 0.333333, NDV Expression: ("E1"."OWNER"[OPTMZ-3])
3 - Predicate : filter("E1"."OWNER"[OPTMZ-3][OPTMZ-2] = "E2"."OWNER"[OPTMZ-1])
5 - Predicate : access("E1"."OBJECT_ID" IN (0[OPTMZ-0], 1[OPTMZ-0], 2[OPTMZ-0]))
Elapsed: 00:00:40.922
Note: 现在的执行计划不再是aggr pushed,而是又增加了一层算子AGGREGATE , 但是从响应时间上看应该是用上了cache后的,一值 35s,二值40s,三值 40s, 减少了相同值的重复调用。
结论:
YashanDB是支持标量子查询的cache功能的,但是在使用“aggr pushed”优化时不会。可以使用selectivity hint干预cost达到目的。
— over–