PostgreSQL/openGauss explain解析(二): indexonlyscan cost

PostgreSQL系(openGAUSS)数据库中的所有索引都是二级索引, 数据表段（ heap）和索引段（index）分别存储，通常对于多列表的SQL只返回或where中仅少量的列时，希望可以只从索引中检索，而不用再从索引回表返回数据（本篇不考虑可见性）提高查询效率，像在oracle中有index full scan和index fast full scan的执行计划，在Postgresql中也支持Btree index的indexonlyscan， MySQL中同样支持，但发现PostGreSQL默认配置的SQL优化器通常判断索引的cost大于表扫描，导致仅查询索引列也未使用索引，这里简单测试。

openGauss

创建测试表

anbob=# select version();
                                                                       version
------------------------------------------------------------------------------------------------------------------------------------------------------
 (openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr   on x86_64-unknown-linux-gnu, compiled by g++ (GCC) 7.3.0, 64-bit
(1 row)

anbob=# create table testa(id int,name varchar(20) not null);
CREATE TABLE
anbob=# insert into testa select x,'anbob'||x from generate_series(1,100000) as x;
INSERT 0 100000

anbob=# create index idx_name on testa(name);
CREATE INDEX
anbob=# analyze testa;
ANALYZE
anbob=# vacuum testa;
VACUUM

anbob=# select relkind,relname,reltuples,relpages from pg_class where relname='testa';
 relkind | relname | reltuples | relpages
---------+---------+-----------+----------
 r       | testa   |    200000 |     1082
(1 row)

Note:
这里创建了testa并且插入了10W记录，在name列创建了索引

查看执行计划

anbob=# explain (analyze,buffers) select a.name from testa a;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Seq Scan on testa a  (cost=0.00..3082.00 rows=200000 width=10) (actual time=0.007..38.215 rows=200000 loops=1)
   (Buffers: shared hit=1082)
 Total runtime: 61.343 ms
(3 rows)

Note:
这里仅查询索引name列，发现使用的还是seq scan. 前面已做vacuum。我觉应该是会走索引扫描，因为索引里包含所有列.

seq scan 计算COST

anbob=# \! sh show cost
 allocate_mem_cost                      | 0                  | Sets the planner's estimate of the cost of allocate memory.
 autovacuum_vacuum_cost_delay           | 20ms               | Vacuum cost delay in milliseconds, for autovacuum.
 autovacuum_vacuum_cost_limit           | -1                 | Vacuum cost amount available before napping, for autovacuum.
 codegen_cost_threshold                 | 10000              | Decided to use LLVM optimization or not.
 cost_param                             | 0                  | Bitmap controls the use of alternative cost model.
 cost_weight_index                      | 1                  | Sets the planner's discount when evaluating index cost.
 cpu_index_tuple_cost                   | 0.005              | Sets the planner's estimate of the cost of processing each index entry during an index scan.
 cpu_operator_cost                      | 0.0025             | Sets the planner's estimate of the cost of processing each operator or function call.
 cpu_tuple_cost                         | 0.01               | Sets the planner's estimate of the cost of processing each tuple (row).
 enable_change_hjcost                   | off                | Enable change hash join cost
 qrw_inlist2join_optmode                | cost_base          | Specify inlist2join opimitzation mode.
 random_page_cost                       | 4                  | Sets the planner's estimate of the cost of a nonsequentially fetched disk page.
 resource_track_cost                    | 100000             | Sets the minimum cost to do resource track.
 seq_page_cost                          | 1                  | Sets the planner's estimate of the cost of a sequentially fetched disk page.
 vacuum_cost_delay                      | 0                  | Vacuum cost delay in milliseconds.
 vacuum_cost_limit                      | 200                | Vacuum cost amount available before napping.
 vacuum_cost_page_dirty                 | 20                 | Vacuum cost for a page dirtied by vacuum.
 vacuum_cost_page_hit                   | 1                  | Vacuum cost for a page found in the buffer cache.
 vacuum_cost_page_miss                  | 10                 | Vacuum cost for a page not found in the buffer cache.

对于上面的全表查询COST 计算方法,依赖seq_page_cost和cpu_tuple_cost
Total cost of Seq Scan
= (estimated sequential page reads * seq_page_cost) + (estimated rows returned * cpu_tuple_cost)
= (1082 * 1) + (200000 * 0.01)
= 1082 + 2000.00
= 3082

如果增加where条件 COST

anbob=# explain (analyze,buffers) select a.name from testa a where id<10000;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Seq Scan on testa a  (cost=0.00..3582.00 rows=20111 width=10) (actual time=0.022..37.919 rows=19998 loops=1)
   Filter: (id < 10000)
   Rows Removed by Filter: 180002
   (Buffers: shared hit=1082)
 Total runtime: 40.077 ms
(5 rows)

Note:
这里增加了where条件返回更少的行，但依旧是扫描了所有的行，COST并没有降低反而增加，因为CPU需要对where条件返回记录需要过滤，增加了cpu_operator_cost
Total cost of Seq Scan with WHERE
= (estimated sequential page reads * seq_page_cost) + (estimated rows returned * cpu_tuple_cost) + (estimated rows returned* cpu_operator_cost)
= (1082 * 1) + (200000 * 0.01) + (200000 * 0.0025)
= 1082 + 2000.00 + 500
= 3582

使用 indexonlyscan SQL HINT COST

anbob=# select relkind,relname,reltuples,relpages from pg_class where relname='idx_name';
 relkind | relname  | reltuples | relpages
---------+----------+-----------+----------
 i       | idx_name |    200000 |      773
(1 row)

anbob=# explain (analyze,buffers) select /*+indexonlyscan(a idx_name) */ a.name from testa a;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_name on testa a  (cost=0.00..6092.25 rows=200000 width=10) (actual time=0.019..43.794 rows=200000 loops=1)
   Heap Fetches: 0
   (Buffers: shared hit=770)
 Total runtime: 66.558 ms
(5 rows)

Note:
这里使用indexonlyscan的COST要比cost高出很多,heap pages 1082,index pages 773这确实不太好理解。我们看一下index only scan的COST计算(I guass,not sure),注意在index scan时使用的是random IO, 所以COST使用的random_page_cost，及扫索引的cpu_index_tuple_cost.

(estimated index only scan page reads * random_page_cost) + (estimated rows returned* cpu_tuple_cost) + (estimated rows returned * cpu_index_tuple_cost)
=(773*4 )+(200000 * 0.01) +(200000* 0.005)
=3092 + 2000 + 1000
=6092

random_page_cost参数
因为random_page_cost默认为4，而seq_page_cost默认为1，这也是cost相差较大的原因，运行在SSD上的实例，random_page_cost设置为1.0~2.0之间，虽然服务器允许将random_page_cost设置的比seq_page_cost小，即使访问数据都在内存一般也只建议两者配置相等,可以对表空间级配置，相对于seq_page_cost，减少这个值将导致系统更倾向于使用索引扫描

减少random_page_cost和random_page_cost
既然推算出了cost的算法，我们常调小参数看是否不用hint使用索引

anbob=# set random_page_cost=1;
SET
anbob=# explain (analyze,buffers)select /*+indexonlyscan(a idx_name) */ a.name from testa a;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_name on testa a  (cost=0.00..3773.25 rows=200000 width=10) (actual time=0.019..44.648 rows=200000 loops=1)
   Heap Fetches: 0
   (Buffers: shared hit=770)
 Total runtime: 67.601 ms
(5 rows)

anbob=# set cpu_index_tuple_cost=0.004;
SET
anbob=# explain (analyze,buffers)select /*+indexonlyscan(a idx_name) */ a.name from testa a;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_name on testa a  (cost=0.00..3573.25 rows=200000 width=10) (actual time=0.020..43.095 rows=200000 loops=1)
   Heap Fetches: 0
   (Buffers: shared hit=770)
 Total runtime: 72.249 ms
(5 rows)

# 恢复默认值
anbob=# set cpu_index_tuple_cost=0.005;
SET

# 但为测试效果, 调整random_page_cost小于seq_page_cost
anbob=# set random_page_cost=0.1;
SET
anbob=# explain (analyze,buffers)select /*+indexonlyscan(a idx_name) */ a.name from testa a;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_name on testa a  (cost=0.00..3077.55 rows=200000 width=10) (actual time=0.019..42.693 rows=200000 loops=1)
   Heap Fetches: 0
   (Buffers: shared hit=770)
 Total runtime: 66.196 ms
(5 rows)

Note:
现在的COST 小于seq scan 的3082 , 我们去掉HINT试试

anbob=# explain (analyze,buffers)select  a.name from testa a;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_name on testa a  (cost=0.00..3077.55 rows=200000 width=10) (actual time=0.019..45.700 rows=200000 loops=1)
   Heap Fetches: 0
   (Buffers: shared hit=770)
 Total runtime: 68.124 ms
(5 rows)

Note:
目前不使用hint，优化器已经可以使用index only scan, 这个从上面的公式看，除了random_page_cost原因，还就是表的列太小了，所以在pages上相差不大，下面我们创建一个相对宽点的表，是否可以使用索引。

openGauss=# create table testc(id int,name varchar(20),addr  varchar(3000));
CREATE TABLE
openGauss=# insert into testc select x,'anbob'||x,rpad('x',2000,'x') from generate_series(1,10000) as x;
INSERT 0 10000
openGauss=# create index idx_testc_name on testc(name);
CREATE INDEX
openGauss=# vacuum analyze testc;
VACUUM
openGauss=# explain analyze select name from testc;
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on testc  (cost=0.00..203.00 rows=10000 width=9) (actual time=0.008..1.922 rows=10000 loops=1)
 Total runtime: 2.920 ms
(2 rows)

openGauss=# set random_page_cost=1;
SET
openGauss=# explain analyze select name from testc;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_testc_name on testc  (cost=0.00..191.25 rows=10000 width=9) (actual time=0.016..1.994 rows=10000 loops=1)
   Heap Fetches: 0
 Total runtime: 2.893 ms
(4 rows)

NOTE:
现在仅把random_page_cost调整和seq_page_cost相等，默认已经可以使用index only scan.，注意创建的testc的name列并没有非空约事，使用 index only scan, 那null也在index中记录？

index only scan对于null

anbob=# insert into testc values(-1,null,null);
INSERT 0 1
anbob=# explain analyze select name from testc;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_testc_name on testc  (cost=0.00..191.25 rows=10000 width=9) (actual time=0.026..2.473 rows=10001 loops=1)
   Heap Fetches: 107
 Total runtime: 3.554 ms
(4 rows)

anbob=# select count(*),count(name) from testc;
 count | count
-------+-------
 10001 | 10000
(1 row)

anbob=# explain analyze select name from testc where name is null;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_testc_name on testc  (cost=0.00..1.27 rows=1 width=9) (actual time=0.013..0.014 rows=1 loops=1)
   Index Cond: (name IS NULL)
   Heap Fetches: 1
 Total runtime: 0.073 ms
(5 rows)

Note:
说明在POSTGRESQL中对于null是可以使用index only scan的。

以上在postgresql和opengauss测试相同,下面看看其他数据库。

MySQL

MYSQL_root@127.0.0.1 [anbob]> select version();
+-------------------+
| version()         |
+-------------------+
| 8.0.20-commercial |
+-------------------+
1 row in set (0.00 sec)


MYSQL_root@127.0.0.1 [anbob]> create table testa(id int,name varchar(20));
Query OK, 0 rows affected (0.06 sec)

MYSQL_root@127.0.0.1 [anbob]> insert into testa
    -> WITH RECURSIVE cte (n) AS
    -> (
    ->   SELECT 1
    ->   UNION ALL
    ->   SELECT n + 1  FROM cte WHERE n < 100000 -> )
    ->  SELECT n,concat('anbob',n) name FROM cte;
ERROR 3636 (HY000): Recursive query aborted after 100000 iterations. Try increasing @@cte_max_recursion_depth to a larger value.


MYSQL_root@127.0.0.1 [anbob]> show variables like 'cte%';
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| cte_max_recursion_depth | 1000  |
+-------------------------+-------+
1 row in set (0.01 sec)

MYSQL_root@127.0.0.1 [anbob]> set cte_max_recursion_depth=100000;
Query OK, 0 rows affected (0.00 sec)

MYSQL_root@127.0.0.1 [anbob]> insert into testa
    -> WITH RECURSIVE cte (n) AS
    -> (
    ->   SELECT 1
    ->   UNION ALL
    ->   SELECT n + 1  FROM cte WHERE n < 100000 -> )
    ->  SELECT n,concat('anbob',n) name FROM cte;
Query OK, 100000 rows affected (1.89 sec)
Records: 100000  Duplicates: 0  Warnings: 0

MYSQL_root@127.0.0.1 [anbob]> create index idx_testa_name on testa(name);
Query OK, 0 rows affected (1.88 sec)
Records: 0  Duplicates: 0  Warnings: 0

MYSQL_root@127.0.0.1 [anbob]> explain format=tree select name from testa;
+--------------------------------------------------------------------------+
| EXPLAIN                                                                  |
+--------------------------------------------------------------------------+
| -> Index scan on testa using idx_testa_name  (cost=10071.15 rows=99989)  |
+--------------------------------------------------------------------------+
1 row in set (0.00 sec)

Note:
在MySQL中2列mysql依旧使用了index scan. 在MYSQL中同样存储null值，所以也可以不需要null约束。

Oracle

SQL> @desc testb
           Name                            Null?    Type
           ------------------------------- -------- ----------------------------
    1      ID                                       NUMBER(38)
    2      NAME                            NOT NULL VARCHAR2(20)
    3      ADDR                                     VARCHAR2(3000)

SQL> explain plan for  select /*+gather_plan_statistics*/ name from testb;
Explained.

SQL> @x2
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------
Plan hash value: 1149720753

---------------------------------------------------------------------------------------
| Id  | Operation            | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                |   100K|  1074K|    85   (0)| 00:00:01 |
|   1 |  INDEX FAST FULL SCAN| IDX_TESTB_NAME |   100K|  1074K|    85   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

8 rows selected.

SQL> alter table testb modify name null;
Table altered.

SQL> explain plan for  select /*+gather_plan_statistics*/ name from testb;
Explained.

SQL> @x2
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 4088136327
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |   100K|  1074K| 13557   (1)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| TESTB |   100K|  1074K| 13557   (1)| 00:00:01 |
---------------------------------------------------------------------------
8 rows selected.

Note:
注意在oracle中需要name列有not null约束，因为在oracle中index是不记录null的，当有not null约束时可以使用index fast full scan多块读的方式更高效率的访问索引。

有兴趣的可以去阅读postgresql源码关于cost计算 costsize.c