首页 » PostgreSQL/GaussDB » PostgreSQL/openGauss explain解析(三): Heap Fetches

PostgreSQL/openGauss explain解析(三): Heap Fetches

在上一篇中提到了indexonlyscan, 在它执行计划中可以看到有一Heap Fetches,这篇主要记录一下它的含义。因为Postgresql系的MVCC实现原理,索引中不存在可见性映射(Visibility information),在PostgreSQL中的indexonlyscan 也并不总是scan index only, 简而言之就是如果表(heap)的数据没有对应可见性映射文件(table’s visibility map.)或不是全部完全可见,indexonlyscan的执行计划还是要回表(heap)去检查数据,回表数据记录在heap fetches. 下面简单的记录,经测试在opengauss和postgresql中还有一小差异,下文提到的postgresql默认也表示opengauss,实际测试在opengauss 5.0,如有不同会特殊说明。

背景

在postgresql中支持index scan和indexonlyscan 索引访问路径,  PostgresSQL 仍然需要确保该行在返回它之前是可见的,并且该信息在table heap上,索引不包含可见性信息. 为什么 EXPLAIN 不显示index scan的heap fetches? 是因为index scan从heap中获取每一行,而indexonlyscan不是的。   如果索引类型支持indexonlyscan并且查询列在索引中,则查询所需的所有数据值都可以从索引中获得,因此仅索引扫描在物理上是可能的。 但是,PostgreSQL中的任何表扫描都有一个额外的要求:它必须验证每个检索到的行对查询的MVCC快照是否“可见”,可见性信息不存储在索引条目中,仅存储在堆条目中,似乎每行检索都需要堆访问。有一种方法可以解决此问题,PostgreSQL 跟踪表堆中每个页面,该页面中存储的所有行是否足够旧,以便对所有当前和未来的事务可见。此信息存储在表的可见性地图中的位中。仅索引扫描在找到候选索引条目后,会检查相应堆页的可见性映射位。如果已设置,则该行已知可见,因此无需进一步工作即可返回数据。如果未设置,则必须访问堆条目以了解它是否可见,因此与标准索引扫描相比,不会获得任何性能优势。即使在成功的情况下,这种方法也用可见性映射访问换取堆访问;但是,由于可见性映射比它描述的堆小四个数量级,因此访问它所需的物理 I/O 要少得多。在大多数情况下,可见性地图始终缓存在内存中。

对于大部分应用场景,确实存在仅仅少量数据的更新,这种表上绝大部分都可以设置了可见映射位,索引这种扫描方式较小的heap fetches, 可以从中收益。 并且索引列是尽可能改动少的列。

对于index only scan,Postgres 需要首先读取并评估可见性地图(visibility map )来绕过特定行的heap,该可见性地图记录了每个页面最近是否被更改。如果页面发生了变化, 在index only scan期间,Postgres 必须在表heap中查找的行数,而不是索引。就像我们进行正常的索引扫描一样,事实上,index only scan比index scan稍微糟糕一些,因为我们添加了对可见性地图的额外检查。有时可以通过Vacuum或调整Autovacuum设置来减少heap fetches,以使可见性地图保持最新状态。使用更小的百分比、更宽的表行、更高的基数更希望使用索引从中优于SEQ SCAN,页面中存储的所有行是否足够旧,以便对所有当前和未来的事务可见.

测试 –env openGauss 5.0

创建测试表

anbob=# create table testb as select * from testa where rownum<=50000;
INSERT 0 50000
anbob=# create index idx_testb_name on testb(name);
CREATE INDEX

anbob=# select relkind,relname,reltuples,relpages from pg_class where relname='testb';
 relkind | relname | reltuples | relpages
---------+---------+-----------+----------
 r       | testb   |     50000 |      271
(1 row)

anbob=# explain (analyze,buffers)select name from testb;
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Seq Scan on testb  (cost=0.00..771.00 rows=50000 width=58) (actual time=0.015..9.187 rows=50000 loops=1)
   (Buffers: shared hit=271)
 Total runtime: 15.831 ms
(3 rows)

对于新创建的表没有做过vacuum或auto vacuum默认没有可见性映射文件(_vm). heap seq scan也就771 cost。

查看文件系统是否有vm文件

anbob=# select 'testb'::regclass::oid;
  oid
-------
 16490
(1 row)

anbob=# select pg_relation_filepath('testb');
 pg_relation_filepath
----------------------
 base/16385/16490
(1 row)

og@oel7db1 16385]$ ls -l 16490*
-rw------- 1 og og 2220032 Jun 10 18:10 16490
-rw------- 1 og og   24576 Jun 10 18:10 16490_fsm
[og@oel7db1 16385]$

注意这个表有free space map文件(_fsm)但是没有visibility map文件(_vm),如果没有可见性映射,PostgreSQL 无法知道该页面中的所有行是否对所有当前事务可见,因此必须访问堆(heap)以获取该信息。

使用index only scan

anbob=# select relkind,relname,reltuples,relpages from pg_class where relname='idx_testb_name';
 relkind |    relname     | reltuples | relpages
---------+----------------+-----------+----------
 i       | idx_testb_name |     50000 |      194
(1 row)

anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_testb_name on testb b  (cost=0.00..2610.25 rows=50000 width=58) (actual time=0.148..49.640 rows=50000 loops=1)
   Heap Fetches: 50000
   (Buffers: shared hit=9726 read=194)
 Total runtime: 56.483 ms
(5 rows)

anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_testb_name on testb b  (cost=0.00..2610.25 rows=50000 width=58) (actual time=0.039..26.424 rows=50000 loops=1)
   Heap Fetches: 50000
   (Buffers: shared hit=9919)
 Total runtime: 32.719 ms
(5 rows)

Note:
注意在使用hint indexonlyscan 后total runtime和COST都远高于seq scan, 并且在两次执行后,buffers命中比较高的前提下。

VACUUM表

anbob=# vacuum testb;
VACUUM

[og@oel7db1 16385]$ ls -l 16490*
-rw------- 1 og og 2220032 Jun 10 18:14 16490
-rw------- 1 og og   24576 Jun 10 18:14 16490_fsm
-rw------- 1 og og    8192 Jun 10 18:14 16490_vm

anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_testb_name on testb b  (cost=0.00..1526.25 rows=50000 width=58) (actual time=0.021..8.973 rows=50000 loops=1)
   Heap Fetches: 0
   (Buffers: shared hit=194)
 Total runtime: 13.771 ms
(5 rows)

note:

在做完Vacuum后, heap fetches有原来的52k降为0. 说明不再做回表查询, 一个真正的index only scan(尽管visibility map总是扫描), cost与runtime也明显降低得到性能提升。

pg_visibility插件

postgres=# create extension pg_visibility;
CREATE EXTENSION
postgres=# select pg_visibility_map('testb'::regclass, 0);
 pg_visibility_map 
-------------------
 (t,f)
(1 row)

NOTE:
pg_visibility插件提供了一种检查表的可见性映射 (VM) 和页面级可见性信息的方法。 它还提供了检查可见性地图完整性并强制重建它的功能。 这第1个“t”,表示true,意味着所有可见。

Vacuum对heap fetche在openGauss和PostgreSQL上的不同

测试均在一个空闲的数据库,只有测试1个session。

# openGAUSS 5

anbob=# \d+ testb
                                Table "public.testb"
 Column |         Type          | Modifiers | Storage  | Stats target | Description
--------+-----------------------+-----------+----------+--------------+-------------
 id     | integer               |           | plain    |              |
 name   | character varying(20) |           | extended |              |
Indexes:
    "idx_testb_name" btree (name) TABLESPACE pg_default
Has OIDs: no
Options: orientation=row, compression=no

anbob=# update testb set id=id+1 where rownum<=1000;
UPDATE 1000

anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_testb_name on testb b  (cost=0.00..1498.37 rows=48141 width=10) (actual time=0.068..12.791 rows=50000 loops=1)
   Heap Fetches: 2028
   (Buffers: shared hit=2194 dirtied=8)
 Total runtime: 18.491 ms
(5 rows)

anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_testb_name on testb b  (cost=0.00..1498.37 rows=48141 width=10) (actual time=0.019..10.592 rows=50000 loops=1)
   Heap Fetches: 2028
   (Buffers: shared hit=2194)
 Total runtime: 15.612 ms
(5 rows)

anbob=# SELECT schemaname,relname,last_autovacuum,last_autoanalyze,autovacuum_count,autoanalyze_count,last_vacuum,vacuum_count FROM pg_stat_all_tables where relname='testb';
 schemaname | relname | last_autovacuum |       last_autoanalyze        | autovacuum_count | autoanalyze_count |          last_vacuum          | vacuum_count
------------+---------+-----------------+-------------------------------+------------------+-------------------+-------------------------------+--------------
 public     | testb   |                 | 2023-06-10 18:19:16.718103+08 |                0 |                 1 | 2023-06-13 09:39:14.157991+08 |           18
(1 row)

Note:
做DML后,index only scan 有2028的heap fetches, 多次查询并不会清理。

anbob=# vacuum verbose testb;   -- 第一次vacuum
INFO:  vacuuming "public.testb"(oel7db1 pid=1848)
INFO:  index "idx_testb_name" now contains 51000 row versions in 194 pages(oel7db1 pid=1848)
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "testb": found 0 removable, 2028 nonremovable row versions in 11 out of 276 pages(oel7db1 pid=1848)
DETAIL:  1000 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_testb_name on testb b  (cost=0.00..1539.36 rows=48250 width=10) (actual time=0.119..14.011 rows=50000 loops=1)
   Heap Fetches: 2028
   (Buffers: shared hit=2195 dirtied=49)
 Total runtime: 19.288 ms
(5 rows)

anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_testb_name on testb b  (cost=0.00..1539.36 rows=48250 width=10) (actual time=0.019..10.869 rows=50000 loops=1)
   Heap Fetches: 1028
   (Buffers: shared hit=201)
 Total runtime: 16.209 ms
(5 rows)

anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_testb_name on testb b  (cost=0.00..1539.36 rows=48250 width=10) (actual time=0.020..11.415 rows=50000 loops=1)
   Heap Fetches: 1028
   (Buffers: shared hit=201)
 Total runtime: 18.165 ms
(5 rows)

anbob=# select relkind,relname,reltuples,relpages from pg_class where relname='testb';
 relkind | relname | reltuples | relpages
---------+---------+-----------+----------
 r       | testb   |     48250 |      276
(1 row)

anbob=# vacuum verbose testb;   -- 第二次vacuum
INFO:  vacuuming "public.testb"(oel7db1 pid=1848)
INFO:  scanned index "idx_testb_name" to remove 1000 row versions(oel7db1 pid=1848)
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "idx_testb_name" now contains 50000 row versions in 194 pages(oel7db1 pid=1848)
DETAIL:  1000 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "testb": found 0 removable, 1028 nonremovable row versions in 11 out of 276 pages(oel7db1 pid=1848)
DETAIL:  0 dead row versions cannot be removed yet. There were 75 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuum 1663/16385/16490, "testb": removed 1000 row versions in 6 pages(oel7db1 pid=1848)
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_testb_name on testb b  (cost=0.00..1508.26 rows=47355 width=10) (actual time=0.034..8.768 rows=50000 loops=1)
   Heap Fetches: 103
   (Buffers: shared hit=196)
 Total runtime: 14.308 ms
(5 rows)

anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_testb_name on testb b  (cost=0.00..1508.26 rows=47355 width=10) (actual time=0.017..10.182 rows=50000 loops=1)
   Heap Fetches: 103
   (Buffers: shared hit=195)
 Total runtime: 16.030 ms
(5 rows)

anbob=# vacuum verbose testb; -- 第三次vacuum
INFO:  vacuuming "public.testb"(oel7db1 pid=1848)
INFO:  index "idx_testb_name" now contains 50000 row versions in 194 pages(oel7db1 pid=1848)
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "testb": found 0 removable, 103 nonremovable row versions in 6 out of 276 pages(oel7db1 pid=1848)
DETAIL:  0 dead row versions cannot be removed yet. There were 1075 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 [Bypass]
 Index Only Scan using idx_testb_name on testb b  (cost=0.00..1472.68 rows=46429 width=10) (actual time=0.027..9.967 rows=50000 loops=1)
   Heap Fetches: 0
   (Buffers: shared hit=194)
 Total runtime: 15.442 ms
(5 rows)

anbob=# SELECT schemaname,relname,last_autovacuum,last_autoanalyze,autovacuum_count,autoanalyze_count,last_vacuum,vacuum_count FROM pg_stat_all_tables where relname='testb';
 schemaname | relname | last_autovacuum |       last_autoanalyze        | autovacuum_count | autoanalyze_count |          last_vacuum          | vacuum_count
------------+---------+-----------------+-------------------------------+------------------+-------------------+-------------------------------+--------------
 public     | testb   |                 | 2023-06-10 18:19:16.718103+08 |                0 |                 1 | 2023-06-13 10:21:10.874028+08 |           21
(1 row)

anbob=#


anbob=# \! sh show vacuum
 autovacuum                             | on                 | Starts the autovacuum subprocess.
 autovacuum_analyze_scale_factor        | 0.1                | Number of tuple inserts, updates, or deletes prior to analyze as a fraction of reltuples.
 autovacuum_analyze_threshold           | 50                 | Minimum number of tuple inserts, updates, or deletes prior to analyze.
 autovacuum_freeze_max_age              | 4000000000         | Age at which to autovacuum a table.
 autovacuum_io_limits                   | -1                 | Sets io_limit for autovacum.
 autovacuum_max_workers                 | 3                  | Sets the maximum number of simultaneously running autovacuum worker processes.
 autovacuum_mode                        | mix                | Sets the behavior of autovacuum
 autovacuum_naptime                     | 10min              | Time to sleep between autovacuum runs.
 autovacuum_vacuum_cost_delay           | 20ms               | Vacuum cost delay in milliseconds, for autovacuum.
 autovacuum_vacuum_cost_limit           | -1                 | Vacuum cost amount available before napping, for autovacuum.
 autovacuum_vacuum_scale_factor         | 0.2                | Number of tuple updates or deletes prior to vacuum as a fraction of reltuples.
 autovacuum_vacuum_threshold            | 50                 | Minimum number of tuple updates or deletes prior to vacuum.
 enable_debug_vacuum                    | off                | This parameter is just used for logging some vacuum info.
 log_autovacuum_min_duration            | -1                 | Sets the minimum execution time above which autovacuum actions will be logged.
 vacuum_cost_delay                      | 0                  | Vacuum cost delay in milliseconds.
 vacuum_cost_limit                      | 200                | Vacuum cost amount available before napping.
 vacuum_cost_page_dirty                 | 20                 | Vacuum cost for a page dirtied by vacuum.
 vacuum_cost_page_hit                   | 1                  | Vacuum cost for a page found in the buffer cache.
 vacuum_cost_page_miss                  | 10                 | Vacuum cost for a page not found in the buffer cache.
 vacuum_defer_cleanup_age               | 0                  | Number of transactions by which VACUUM and HOT cleanup should be deferred, if any.
 vacuum_freeze_min_age                  | 2000000000         | Minimum age at which VACUUM should freeze a table row.
 vacuum_freeze_table_age                | 4000000000         | Age at which VACUUM should scan whole table to freeze tuples.
 vacuum_gtt_defer_check_age             | 10000              | The defer check age of GTT, used to check expired data after vacuum.

Note:
在openGauss中测试是做了3次vacuum才把heap fetches降到0. 暂时未找到原因,后期了解后再更新。(update – 2024/04- 朋友反馈vacuum analyze 1次就可以,你可以试试)

# PostgreSQL 13

anbob=# 
select version();
                          version
------------------------------------------------------------
 PostgreSQL 13.1, compiled by Visual C++ build 1914, 64-bit
(1 行记录)

anbob=# update testc set id=id+1  where id<10000;
UPDATE 19996
anbob=# explain analyze select name from testc;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using idx_testc_name on testc  (cost=0.42..1769.59 rows=96178 width=10) (actual time=0.864..35.785 rows=110000 loops=1)
   Heap Fetches: 40059
 Planning Time: 1.617 ms
 Execution Time: 38.599 ms
(4 行记录)

anbob=# vacuum verbose testc;
信息:  正在清理 (vacuum)  "public.testc"
信息:  索引"idx_testc_name"在653个页中包含了行版本号108137
描述:  索引行版本0被删除.
0个索引页已经被删除,0当前可重用.
CPU:用户:0.00 s,系统:0.00 s,已用时间:0.00 s.
信息:  "testc": 在超出1340页的1中找到可删除版本号0, 不可删除的版本号19
描述:  0的死亡行版本还不能被移除,最老的xmin: 808
有60个未用的项标识符.
由于缓冲区占用而跳过0个页面,204个冻结页面.
0 个页面完全为空。
CPU:用户:0.00 s,系统:0.00 s,已用时间:0.07 s
信息:  正在清理 (vacuum)  "pg_toast.pg_toast_73976"
信息:  索引"pg_toast_73976_index"在1个页中包含了行版本号0
描述:  索引行版本0被删除.
0个索引页已经被删除,0当前可重用.
CPU:用户:0.00 s,系统:0.00 s,已用时间:0.00 s.
信息:  "pg_toast_73976": 在超出0页的0中找到可删除版本号0, 不可删除的版本号0
描述:  0的死亡行版本还不能被移除,最老的xmin: 808
有0个未用的项标识符.
由于缓冲区占用而跳过0个页面,0个冻结页面.
0 个页面完全为空。
CPU:用户:0.00 s,系统:0.00 s,已用时间:0.00 s
VACUUM

anbob=# explain analyze select name from testc;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using idx_testc_name on testc  (cost=0.42..1975.97 rows=109937 width=10) (actual time=0.516..13.016 rows=110000 loops=1)
   Heap Fetches: 0
 Planning Time: 80.679 ms
 Execution Time: 15.232 ms
(4 行记录)

Note:
在PostgreSQL中一次vacuum就可以看到随后的index only scan的heap fetches就降为0.

小结:
在Postgresql系数据库中(如openGauss)index only scan并不总是仅索引, 需要扫描可见性映射文件(_vm)判断数据的可见性,如果有数据修改需要回表heap查询数据显示为heap fetches, 如果该值为0表示仅索引(和可见性映射文件)的扫描。 根据该表上发生的修改次数,PostgreSQL 很可能必须多次访问表heap,这当然会减慢SQL查询速度。 对于大多数块是静态的表,indexonlyscan非常好选择。

 

— 如果您发现有不对之处,请反馈给我。

打赏

,

对不起,这篇文章暂时关闭评论。