PostgreSQL系(openGAUSS)数据库中的所有索引都是二级索引, 数据表段( heap)和索引段(index)分别存储,通常对于多列表的SQL只返回或where中仅少量的列时,希望可以只从索引中检索,而不用再从索引回表返回数据(本篇不考虑可见性)提高查询效率,像在oracle中有index full scan和index fast full scan的执行计划,在Postgresql中也支持Btree index的indexonlyscan, MySQL中同样支持,但发现PostGreSQL默认配置的SQL优化器通常判断索引的cost大于表扫描,导致仅查询索引列也未使用索引,这里简单测试。



anbob=# select version();
 (openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr   on x86_64-unknown-linux-gnu, compiled by g++ (GCC) 7.3.0, 64-bit
(1 row)

anbob=# create table testa(id int,name varchar(20) not null);
anbob=# insert into testa select x,'anbob'||x from generate_series(1,100000) as x;
INSERT 0 100000

anbob=# create index idx_name on testa(name);
anbob=# analyze testa;
anbob=# vacuum testa;

anbob=# select relkind,relname,reltuples,relpages from pg_class where relname='testa';
 relkind | relname | reltuples | relpages
 r       | testa   |    200000 |     1082
(1 row)



anbob=# explain (analyze,buffers) select a.name from testa a;
                                                   QUERY PLAN
 Seq Scan on testa a  (cost=0.00..3082.00 rows=200000 width=10) (actual time=0.007..38.215 rows=200000 loops=1)
   (Buffers: shared hit=1082)
 Total runtime: 61.343 ms
(3 rows)

这里仅查询索引name列,发现使用的还是seq scan. 前面已做vacuum。我觉应该是会走索引扫描,因为索引里包含所有列.

seq scan 计算COST

anbob=# \! sh show cost
 allocate_mem_cost                      | 0                  | Sets the planner's estimate of the cost of allocate memory.
 autovacuum_vacuum_cost_delay           | 20ms               | Vacuum cost delay in milliseconds, for autovacuum.
 autovacuum_vacuum_cost_limit           | -1                 | Vacuum cost amount available before napping, for autovacuum.
 codegen_cost_threshold                 | 10000              | Decided to use LLVM optimization or not.
 cost_param                             | 0                  | Bitmap controls the use of alternative cost model.
 cost_weight_index                      | 1                  | Sets the planner's discount when evaluating index cost.
 cpu_index_tuple_cost                   | 0.005              | Sets the planner's estimate of the cost of processing each index entry during an index scan.
 cpu_operator_cost                      | 0.0025             | Sets the planner's estimate of the cost of processing each operator or function call.
 cpu_tuple_cost                         | 0.01               | Sets the planner's estimate of the cost of processing each tuple (row).
 enable_change_hjcost                   | off                | Enable change hash join cost
 qrw_inlist2join_optmode                | cost_base          | Specify inlist2join opimitzation mode.
 random_page_cost                       | 4                  | Sets the planner's estimate of the cost of a nonsequentially fetched disk page.
 resource_track_cost                    | 100000             | Sets the minimum cost to do resource track.
 seq_page_cost                          | 1                  | Sets the planner's estimate of the cost of a sequentially fetched disk page.
 vacuum_cost_delay                      | 0                  | Vacuum cost delay in milliseconds.
 vacuum_cost_limit                      | 200                | Vacuum cost amount available before napping.
 vacuum_cost_page_dirty                 | 20                 | Vacuum cost for a page dirtied by vacuum.
 vacuum_cost_page_hit                   | 1                  | Vacuum cost for a page found in the buffer cache.
 vacuum_cost_page_miss                  | 10                 | Vacuum cost for a page not found in the buffer cache.

对于上面的全表查询COST 计算方法,依赖seq_page_cost和cpu_tuple_cost
Total cost of Seq Scan
= (estimated sequential page reads * seq_page_cost) + (estimated rows returned * cpu_tuple_cost)
= (1082 * 1) + (200000 * 0.01)
= 1082 + 2000.00
= 3082

如果增加where条件 COST

anbob=# explain (analyze,buffers) select a.name from testa a where id<10000;
                                                  QUERY PLAN
 Seq Scan on testa a  (cost=0.00..3582.00 rows=20111 width=10) (actual time=0.022..37.919 rows=19998 loops=1)
   Filter: (id < 10000)
   Rows Removed by Filter: 180002
   (Buffers: shared hit=1082)
 Total runtime: 40.077 ms
(5 rows)

Total cost of Seq Scan with WHERE
= (estimated sequential page reads * seq_page_cost) + (estimated rows returned * cpu_tuple_cost) + (estimated rows returned* cpu_operator_cost)
= (1082 * 1) + (200000 * 0.01) + (200000 * 0.0025)
= 1082 + 2000.00 + 500
= 3582

使用 indexonlyscan SQL HINT COST

anbob=# select relkind,relname,reltuples,relpages from pg_class where relname='idx_name';
 relkind | relname  | reltuples | relpages
 i       | idx_name |    200000 |      773
(1 row)

anbob=# explain (analyze,buffers) select /*+indexonlyscan(a idx_name) */ a.name from testa a;
                                                              QUERY PLAN
 Index Only Scan using idx_name on testa a  (cost=0.00..6092.25 rows=200000 width=10) (actual time=0.019..43.794 rows=200000 loops=1)
   Heap Fetches: 0
   (Buffers: shared hit=770)
 Total runtime: 66.558 ms
(5 rows)

这里使用indexonlyscan的COST要比cost高出很多,heap pages 1082,index pages 773这确实不太好理解。我们看一下index only scan的COST计算(I guass,not sure),注意在index scan时使用的是random IO, 所以COST使用的random_page_cost,及扫索引的cpu_index_tuple_cost.

(estimated index only scan page reads * random_page_cost) + (estimated rows returned* cpu_tuple_cost) + (estimated rows returned * cpu_index_tuple_cost)
=(773*4 )+(200000 * 0.01) +(200000* 0.005)
=3092 + 2000 + 1000




anbob=# set random_page_cost=1;
anbob=# explain (analyze,buffers)select /*+indexonlyscan(a idx_name) */ a.name from testa a;
                                                              QUERY PLAN
 Index Only Scan using idx_name on testa a  (cost=0.00..3773.25 rows=200000 width=10) (actual time=0.019..44.648 rows=200000 loops=1)
   Heap Fetches: 0
   (Buffers: shared hit=770)
 Total runtime: 67.601 ms
(5 rows)

anbob=# set cpu_index_tuple_cost=0.004;
anbob=# explain (analyze,buffers)select /*+indexonlyscan(a idx_name) */ a.name from testa a;
                                                              QUERY PLAN
 Index Only Scan using idx_name on testa a  (cost=0.00..3573.25 rows=200000 width=10) (actual time=0.020..43.095 rows=200000 loops=1)
   Heap Fetches: 0
   (Buffers: shared hit=770)
 Total runtime: 72.249 ms
(5 rows)

# 恢复默认值
anbob=# set cpu_index_tuple_cost=0.005;

# 但为测试效果, 调整random_page_cost小于seq_page_cost
anbob=# set random_page_cost=0.1;
anbob=# explain (analyze,buffers)select /*+indexonlyscan(a idx_name) */ a.name from testa a;
                                                              QUERY PLAN
 Index Only Scan using idx_name on testa a  (cost=0.00..3077.55 rows=200000 width=10) (actual time=0.019..42.693 rows=200000 loops=1)
   Heap Fetches: 0
   (Buffers: shared hit=770)
 Total runtime: 66.196 ms
(5 rows)

现在的COST 小于seq scan 的3082 , 我们去掉HINT试试

anbob=# explain (analyze,buffers)select  a.name from testa a;
                                                              QUERY PLAN
 Index Only Scan using idx_name on testa a  (cost=0.00..3077.55 rows=200000 width=10) (actual time=0.019..45.700 rows=200000 loops=1)
   Heap Fetches: 0
   (Buffers: shared hit=770)
 Total runtime: 68.124 ms
(5 rows)

目前不使用hint,优化器已经可以使用index only scan, 这个从上面的公式看,除了random_page_cost原因,还就是表的列太小了,所以在pages上相差不大,下面我们创建一个相对宽点的表,是否可以使用索引。

openGauss=# create table testc(id int,name varchar(20),addr  varchar(3000));
openGauss=# insert into testc select x,'anbob'||x,rpad('x',2000,'x') from generate_series(1,10000) as x;
INSERT 0 10000
openGauss=# create index idx_testc_name on testc(name);
openGauss=# vacuum analyze testc;
openGauss=# explain analyze select name from testc;
                                               QUERY PLAN
 Seq Scan on testc  (cost=0.00..203.00 rows=10000 width=9) (actual time=0.008..1.922 rows=10000 loops=1)
 Total runtime: 2.920 ms
(2 rows)

openGauss=# set random_page_cost=1;
openGauss=# explain analyze select name from testc;
                                                             QUERY PLAN
 Index Only Scan using idx_testc_name on testc  (cost=0.00..191.25 rows=10000 width=9) (actual time=0.016..1.994 rows=10000 loops=1)
   Heap Fetches: 0
 Total runtime: 2.893 ms
(4 rows)

现在仅把random_page_cost调整和seq_page_cost相等, 默认已经可以使用index only scan.,注意创建的testc的name列并没有非空约事,使用 index only scan, 那null也在index中记录?

index only scan对于null

anbob=# insert into testc values(-1,null,null);
anbob=# explain analyze select name from testc;
                                                             QUERY PLAN
 Index Only Scan using idx_testc_name on testc  (cost=0.00..191.25 rows=10000 width=9) (actual time=0.026..2.473 rows=10001 loops=1)
   Heap Fetches: 107
 Total runtime: 3.554 ms
(4 rows)

anbob=# select count(*),count(name) from testc;
 count | count
 10001 | 10000
(1 row)

anbob=# explain analyze select name from testc where name is null;
                                                        QUERY PLAN
 Index Only Scan using idx_testc_name on testc  (cost=0.00..1.27 rows=1 width=9) (actual time=0.013..0.014 rows=1 loops=1)
   Index Cond: (name IS NULL)
   Heap Fetches: 1
 Total runtime: 0.073 ms
(5 rows)

说明在POSTGRESQL中对于null是可以使用index only scan的。



MYSQL_root@ [anbob]> select version();
| version()         |
| 8.0.20-commercial |
1 row in set (0.00 sec)

MYSQL_root@ [anbob]> create table testa(id int,name varchar(20));
Query OK, 0 rows affected (0.06 sec)

MYSQL_root@ [anbob]> insert into testa
    -> WITH RECURSIVE cte (n) AS
    -> (
    ->   SELECT 1
    ->   UNION ALL
    ->   SELECT n + 1  FROM cte WHERE n < 100000 -> )
    ->  SELECT n,concat('anbob',n) name FROM cte;
ERROR 3636 (HY000): Recursive query aborted after 100000 iterations. Try increasing @@cte_max_recursion_depth to a larger value.

MYSQL_root@ [anbob]> show variables like 'cte%';
| Variable_name           | Value |
| cte_max_recursion_depth | 1000  |
1 row in set (0.01 sec)

MYSQL_root@ [anbob]> set cte_max_recursion_depth=100000;
Query OK, 0 rows affected (0.00 sec)

MYSQL_root@ [anbob]> insert into testa
    -> WITH RECURSIVE cte (n) AS
    -> (
    ->   SELECT 1
    ->   UNION ALL
    ->   SELECT n + 1  FROM cte WHERE n < 100000 -> )
    ->  SELECT n,concat('anbob',n) name FROM cte;
Query OK, 100000 rows affected (1.89 sec)
Records: 100000  Duplicates: 0  Warnings: 0

MYSQL_root@ [anbob]> create index idx_testa_name on testa(name);
Query OK, 0 rows affected (1.88 sec)
Records: 0  Duplicates: 0  Warnings: 0

MYSQL_root@ [anbob]> explain format=tree select name from testa;
| EXPLAIN                                                                  |
| -> Index scan on testa using idx_testa_name  (cost=10071.15 rows=99989)  |
1 row in set (0.00 sec)

在MySQL中2列mysql依旧使用了index scan. 在MYSQL中同样存储null值,所以也可以不需要null约束。


SQL> @desc testb
           Name                            Null?    Type
           ------------------------------- -------- ----------------------------
    1      ID                                       NUMBER(38)
    2      NAME                            NOT NULL VARCHAR2(20)
    3      ADDR                                     VARCHAR2(3000)

SQL> explain plan for  select /*+gather_plan_statistics*/ name from testb;

SQL> @x2
Plan hash value: 1149720753

| Id  | Operation            | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT     |                |   100K|  1074K|    85   (0)| 00:00:01 |
|   1 |  INDEX FAST FULL SCAN| IDX_TESTB_NAME |   100K|  1074K|    85   (0)| 00:00:01 |

8 rows selected.

SQL> alter table testb modify name null;
Table altered.

SQL> explain plan for  select /*+gather_plan_statistics*/ name from testb;

SQL> @x2
Plan hash value: 4088136327
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT  |       |   100K|  1074K| 13557   (1)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| TESTB |   100K|  1074K| 13557   (1)| 00:00:01 |
8 rows selected.

注意在oracle中需要name列有not null约束,因为在oracle中index是不记录null的,当有not null约束时可以使用index fast full scan多块读的方式更高效率的访问索引。

有兴趣的可以去阅读postgresql源码 关于cost计算 costsize.c

