首页 » PostgreSQL/GaussDB » PostgreSQL/openGauss explain解析(四): indexonlyscan和 覆盖索引

PostgreSQL/openGauss explain解析(四): indexonlyscan和 覆盖索引

前2篇中对index only scan的测试能看出在 Oracle、MySQL(InnoDB)、PostgreSQL三类数据库中,对于OLTP高负载的场景中,oracle和mysql(innodb)都是块级别的MVCC是可以做到真正的index only scan, 而postgresql因为MVCC的可见性不存储在索引,在数据变更后会带来indexonlyscan with heap fetchesl回表,效率可能有所减退。通常在Oracle中如想做到index 覆盖到所有查询的列,会创建多列复合索引或function索引,避免索引查询回表,但在Postgresql或openGauss系中索引相对Oracle还有两种特殊情况,简单记录一下覆盖索引( Covering Indexes).

# openGauss
函数索引

anbob=# select version();
                                                                       version
------------------------------------------------------------------------------------------------------------------------------------------------------
 (openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr   on x86_64-unknown-linux-gnu, compiled by g++ (GCC) 7.3.0, 64-bit
(1 row)

anbob=# create table testc(id int,name varchar(20),addr  varchar(3000),otype char(1));
CREATE TABLE
anbob=# insert into testc select x,'anbob'||x,rpad('x',2000,'x'),'t' from generate_series(1,10000) as x;
INSERT 0 10000
anbob=# create index idx_testc_uppname on testc(upper(name));
CREATE INDEX
anbob=# vacuum testc;
VACUUM

上面创建了一个函数索引,下面我们检索一下是否可以使用.

anbob=# explain (analyze,buffers)select * from testc where upper(name)='ANBOB2';
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Scan using idx_testc_uppname on testc  (cost=0.00..43.13 rows=50 width=586) (actual time=0.026..0.027 rows=1 loops=1)
   Index Cond: (upper((name)::text) = 'ANBOB2'::text)
   (Buffers: shared hit=3)
 Total runtime: 0.104 ms
(5 rows)

index scan正常使用上了创建的函数索引,下面只查询函数列,确认是否使用index only scan.

anbob=# explain (analyze,buffers)select upper(name) from testc where upper(name)='ANBOB2';
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_testc_uppname on testc  (cost=0.00..43.25 rows=50 width=58) (actual time=0.047..0.049 rows=1 loops=1)
   Index Cond: (upper((name)::text) = 'ANBOB2'::text)
   (Buffers: shared hit=3)
 Total runtime: 0.149 ms
(4 rows)

anbob=# explain (analyze,buffers)select /*+indexonlyscan(c)*/upper(name) from testc c where upper(name)='ANBOB2';
WARNING:  unused hint: IndexOnlyScan(c)
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_testc_uppname on testc c  (cost=0.00..43.25 rows=50 width=58) (actual time=0.102..0.104 rows=1 loops=1)
   Index Cond: (upper((name)::text) = 'ANBOB2'::text)
   (Buffers: shared hit=3)
 Total runtime: 0.182 ms
(4 rows)

Note:
注意这里只select upper(name) 列即使增加hint也无法使用indexonlyscan, 在官方Covering Indexes有记录这是PostgreSQL系优化器不太聪明的地方,仅当查询所需的所有列都可以从索引中获取时,它才会认为查询可能通过indexonlyscan执行,也就是它认为还需要name列,没有察觉到name列是函数索引的值,解决方法是创建一个覆盖索引。

覆盖索引

anbob=# create index idx_testc_uppname2 on testc(upper(name)) include(name);
ERROR:  create a index with include columns is only supported in ubtree

anbob=# \d+ testc
                                 Table "public.testc"
 Column |          Type           | Modifiers | Storage  | Stats target | Description
--------+-------------------------+-----------+----------+--------------+-------------
 id     | integer                 |           | plain    |              |
 name   | character varying(20)   |           | extended |              |
 addr   | character varying(3000) |           | extended |              |
 otype  | character(1)            |           | extended |              |
Indexes:
    "idx_testc_uppname" btree (upper(name::text)) TABLESPACE pg_default
    "idx_testc_uppname2" btree (upper(name::text), name) TABLESPACE pg_default
Has OIDs: no
Options: orientation=row, compression=no

Note:
这是openGauss仅在不常用的ustore存储引擎支持PG这个功能,也难怪create index INCLUDE功能是从pg 11版本引入,而OG开始是基于pg 9.2版本。Ustore 是opengauss引入的UNDO的机制,不在本篇的范围,如果不用USTORE,我们可以增加多列索引。

多列复合索引

anbob=# create index idx_testc_uppname2 on testc(upper(name),name);
CREATE INDEX
anbob=# explain (analyze,buffers)select /*+indexonlyscan(c)*/upper(name) from testc c where upper(name)='ANBOB2';
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using idx_testc_uppname2 on testc c  (cost=0.00..1.27 rows=1 width=9) (actual time=0.120..0.121 rows=1 loops=1)
   Index Cond: ((upper((name)::text)) = 'ANBOB2'::text)
   Heap Fetches: 0
   (Buffers: shared hit=1 read=3)
 Total runtime: 0.196 ms
(5 rows)

anbob=# explain (analyze,buffers)select upper(name) from testc c where upper(name)='ANBOB2';
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using idx_testc_uppname2 on testc c  (cost=0.00..1.27 rows=1 width=9) (actual time=0.030..0.032 rows=1 loops=1)
   Index Cond: ((upper((name)::text)) = 'ANBOB2'::text)
   Heap Fetches: 0
   (Buffers: shared hit=3)
 Total runtime: 0.162 ms
(5 rows)

NOTE:
增加多列索引可以使用index only scan. 我们测试一下相同的结构在postgresql中的表现。

# postgresql

anbob=# create table testc(id int,name varchar(20),addr  varchar(3000),otype char(1));
CREATE TABLE
anbob=#
anbob=# insert into testc select x,'anbob'||x,rpad('x',2000,'x'),'t' from generate_series(1,10000) as x;
INSERT 0 10000
anbob=#  create index idx_testc_uppname on testc(upper(name));
CREATE INDEX
anbob=# vacuum testc;
VACUUM

anbob=# explain (analyze,buffers)select upper(name) from testc where upper(name)='ANBOB2'
anbob-# ;
                                                        QUERY PLAN                                                      
--------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_testc_uppname on testc  (cost=0.29..8.30 rows=1 width=32) (actual time=0.060..0.063 rows=1 loops=1)
   Index Cond: (upper((name)::text) = 'ANBOB2'::text)
   Buffers: shared hit=3
 Planning Time: 0.252 ms
 Execution Time: 0.090 ms
(5 行记录)


anbob=# create index idx_testc_uppname2 on testc(upper(name)) include (name);
CREATE INDEX
anbob=# explain (analyze,buffers)select upper(name) from testc where upper(name)='ANBOB2';
                                                           QUERY PLAN                                                   
--------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using idx_testc_uppname2 on testc  (cost=0.29..4.31 rows=1 width=32) (actual time=0.676..0.678 rows=1 loops=1)
   Index Cond: ((upper((name)::text)) = 'ANBOB2'::text)
   Heap Fetches: 0
   Buffers: shared hit=1 read=2
 Planning:
   Buffers: shared hit=20 read=1
 Planning Time: 2.882 ms
 Execution Time: 0.707 ms
(8 行记录)


anbob=# select version();
                          version
------------------------------------------------------------
 PostgreSQL 13.1, compiled by Visual C++ build 1914, 64-bit
(1 行记录)
Note:
在postgresql中使用include覆盖索引也可以支持index only scan.

generated always列
在opengauss和postgresql中都支持generated always生成列,类型oracle的虚拟列,但目前仅支持store实际是占用空间。并且直接在列上创建索引

# opengauss
anbob=# alter table testc add column upper_name text generated always as (upper(name)) stored;
ALTER TABLE
anbob=# create index idx_testc_uppname3 on testc(upper_name);
CREATE INDEX
anbob=# vacuum testc;
VACUUM
anbob=# explain (analyze,buffers)select upper_name from testc where upper_name='ANBOB2';
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_testc_uppname3 on testc  (cost=0.00..5.12 rows=50 width=32) (actual time=0.138..0.140 rows=1 loops=1)
   Index Cond: (upper_name = 'ANBOB2'::text)
   Heap Fetches: 0
   (Buffers: shared hit=3 read=1)
 Total runtime: 0.221 ms
(6 rows)

# postgresql
anbob=# alter table testc add column upper_name text generated always as (upper(name)) stored;
ALTER TABLE
anbob=# create index idx_testc_uppname3 on testc(upper_name);
CREATE INDEX
anbob=# explain (analyze,buffers)select upper_name from testc where upper_name='ANBOB2';
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on testc  (cost=4.67..96.82 rows=50 width=32) (actual time=0.076..0.077 rows=1 loops=1)
   Recheck Cond: (upper_name = 'ANBOB2'::text)
   Heap Blocks: exact=1
   Buffers: shared hit=1 read=2
   ->  Bitmap Index Scan on idx_testc_uppname3  (cost=0.00..4.66 rows=50 width=0) (actual time=0.069..0.070 rows=1 loops=1)
         Index Cond: (upper_name = 'ANBOB2'::text)
         Buffers: shared read=2
 Planning:
   Buffers: shared hit=30 read=3
 Planning Time: 2.617 ms
 Execution Time: 0.142 ms
(11 行记录)

anbob=# vacuum testc;
VACUUM
anbob=# explain (analyze,buffers)select upper_name from testc where upper_name='ANBOB2';
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using idx_testc_uppname3 on testc  (cost=0.29..5.16 rows=50 width=32) (actual time=0.814..0.818 rows=1 loops=1)
   Index Cond: (upper_name = 'ANBOB2'::text)
   Heap Fetches: 0
   Buffers: shared hit=3
 Planning:
   Buffers: shared hit=4
 Planning Time: 1.143 ms
 Execution Time: 0.844 ms
(8 行记录)

多列索引与覆盖索引的区别
可见在postgresql中支持multi-column多索引create index on (X,Y) 或 covering 覆盖索引create index on (x) include (y) 两种方式都可以为做到覆盖索引列使用index only scan,那两者有什么差异呢?

在墨天轮社区有一篇记录相关的文章,here, 已提到使用 pageinspect插件,可以像在oracle中dump index block一样,查看创建的 b-tree 类型的索引内部值,两种索引的KEY value值是一样的都记录了两列值。

anbob=# CREATE TABLE test (id serial PRIMARY KEY, some_rand int4, larger text);
CREATE TABLE
anbob=# INSERT INTO test (some_rand, larger) SELECT random() * 500000, substr(md5(i::text), 1, 10) FROM generate_series(1,10000000) i;
INSERT 0 10000000
anbob=# CREATE INDEX magic_idx ON test (some_rand, id) include (larger);
CREATE INDEX
anbob=# create index large_idx on test (some_rand, id, larger);
CREATE INDEX
anbob=# SELECT pg_size_pretty( pg_relation_size('magic_idx'::regclass));
 pg_size_pretty
----------------
 386 MB
(1 行记录)


anbob=# SELECT pg_size_pretty( pg_relation_size('large_idx'::regclass));
 pg_size_pretty
----------------
 386 MB
(1 行记录)

Note:
存储空间上大小是相同的。

那covering index也有自己的优势如:
1, 使用INCLUDE子句的优点之一是b树中的级别更少。因为它们不包含包含列,所有INCLUDE列都存储在b树索引的双链表中。
2, 不破坏约束,又能包含索引列,如果对表只希望email唯一,并覆盖姓名
使用它
CREATE UNIQUE INDEX emails_idx ON bankdb(email) INCLUDE(first_name,last_name);
而不是它
CREATE UNIQUE INDEX emails_idx ON bankdb(email,first_name,last_name);

当然它也有自己的限制:
Expressions are not supported as included columns since they cannot be used in index-only scans.

— over —

打赏

,

对不起,这篇文章暂时关闭评论。