在上一篇中提到了indexonlyscan, 在它执行计划中可以看到有一行Heap Fetches,这篇主要记录一下它的含义。因为Postgresql系的MVCC实现原理,索引中不存在可见性映射(Visibility information),在PostgreSQL中的indexonlyscan 也并不总是scan index only, 简而言之就是如果表(heap)的数据没有对应可见性映射文件(table’s visibility map.)或不是全部完全可见,indexonlyscan的执行计划还是要回表(heap)去检查数据,回表数据记录在heap fetches. 下面简单的记录,经测试在opengauss和postgresql中还有一小差异,下文提到的postgresql默认也表示opengauss,实际测试在opengauss 5.0,如有不同会特殊说明。
背景
在postgresql中支持index scan和indexonlyscan 索引访问路径, PostgresSQL 仍然需要确保该行在返回它之前是可见的,并且该信息在table heap上,索引不包含可见性信息. 为什么 EXPLAIN 不显示index scan的heap fetches? 是因为index scan从heap中获取每一行,而indexonlyscan不是的。 如果索引类型支持indexonlyscan并且查询列在索引中,则查询所需的所有数据值都可以从索引中获得,因此仅索引扫描在物理上是可能的。 但是,PostgreSQL中的任何表扫描都有一个额外的要求:它必须验证每个检索到的行对查询的MVCC快照是否“可见”,可见性信息不存储在索引条目中,仅存储在堆条目中,似乎每行检索都需要堆访问。有一种方法可以解决此问题,PostgreSQL 跟踪表堆中每个页面,该页面中存储的所有行是否足够旧,以便对所有当前和未来的事务可见。此信息存储在表的可见性地图中的位中。仅索引扫描在找到候选索引条目后,会检查相应堆页的可见性映射位。如果已设置,则该行已知可见,因此无需进一步工作即可返回数据。如果未设置,则必须访问堆条目以了解它是否可见,因此与标准索引扫描相比,不会获得任何性能优势。即使在成功的情况下,这种方法也用可见性映射访问换取堆访问;但是,由于可见性映射比它描述的堆小四个数量级,因此访问它所需的物理 I/O 要少得多。在大多数情况下,可见性地图始终缓存在内存中。
对于大部分应用场景,确实存在仅仅少量数据的更新,这种表上绝大部分都可以设置了可见映射位,索引这种扫描方式较小的heap fetches, 可以从中收益。 并且索引列是尽可能改动少的列。
对于index only scan,Postgres 需要首先读取并评估可见性地图(visibility map )来绕过特定行的heap,该可见性地图记录了每个页面最近是否被更改。如果页面发生了变化, 在index only scan期间,Postgres 必须在表heap中查找的行数,而不是索引。就像我们进行正常的索引扫描一样,事实上,index only scan比index scan稍微糟糕一些,因为我们添加了对可见性地图的额外检查。有时可以通过Vacuum或调整Autovacuum设置来减少heap fetches,以使可见性地图保持最新状态。使用更小的百分比、更宽的表行、更高的基数更希望使用索引从中优于SEQ SCAN,页面中存储的所有行是否足够旧,以便对所有当前和未来的事务可见.
测试 –env openGauss 5.0
创建测试表
anbob=# create table testb as select * from testa where rownum<=50000;
INSERT 0 50000
anbob=# create index idx_testb_name on testb(name);
CREATE INDEX
anbob=# select relkind,relname,reltuples,relpages from pg_class where relname='testb';
relkind | relname | reltuples | relpages
---------+---------+-----------+----------
r | testb | 50000 | 271
(1 row)
anbob=# explain (analyze,buffers)select name from testb;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Seq Scan on testb (cost=0.00..771.00 rows=50000 width=58) (actual time=0.015..9.187 rows=50000 loops=1)
(Buffers: shared hit=271)
Total runtime: 15.831 ms
(3 rows)
对于新创建的表没有做过vacuum或auto vacuum默认没有可见性映射文件(_vm). heap seq scan也就771 cost。
查看文件系统是否有vm文件
anbob=# select 'testb'::regclass::oid;
oid
-------
16490
(1 row)
anbob=# select pg_relation_filepath('testb');
pg_relation_filepath
----------------------
base/16385/16490
(1 row)
og@oel7db1 16385]$ ls -l 16490*
-rw------- 1 og og 2220032 Jun 10 18:10 16490
-rw------- 1 og og 24576 Jun 10 18:10 16490_fsm
[og@oel7db1 16385]$
注意这个表有free space map文件(_fsm)但是没有visibility map文件(_vm),如果没有可见性映射,PostgreSQL 无法知道该页面中的所有行是否对所有当前事务可见,因此必须访问堆(heap)以获取该信息。
使用index only scan
anbob=# select relkind,relname,reltuples,relpages from pg_class where relname='idx_testb_name';
relkind | relname | reltuples | relpages
---------+----------------+-----------+----------
i | idx_testb_name | 50000 | 194
(1 row)
anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
[Bypass]
Index Only Scan using idx_testb_name on testb b (cost=0.00..2610.25 rows=50000 width=58) (actual time=0.148..49.640 rows=50000 loops=1)
Heap Fetches: 50000
(Buffers: shared hit=9726 read=194)
Total runtime: 56.483 ms
(5 rows)
anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
[Bypass]
Index Only Scan using idx_testb_name on testb b (cost=0.00..2610.25 rows=50000 width=58) (actual time=0.039..26.424 rows=50000 loops=1)
Heap Fetches: 50000
(Buffers: shared hit=9919)
Total runtime: 32.719 ms
(5 rows)
Note:
注意在使用hint indexonlyscan 后total runtime和COST都远高于seq scan, 并且在两次执行后,buffers命中比较高的前提下。
VACUUM表
anbob=# vacuum testb;
VACUUM
[og@oel7db1 16385]$ ls -l 16490*
-rw------- 1 og og 2220032 Jun 10 18:14 16490
-rw------- 1 og og 24576 Jun 10 18:14 16490_fsm
-rw------- 1 og og 8192 Jun 10 18:14 16490_vm
anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
[Bypass]
Index Only Scan using idx_testb_name on testb b (cost=0.00..1526.25 rows=50000 width=58) (actual time=0.021..8.973 rows=50000 loops=1)
Heap Fetches: 0
(Buffers: shared hit=194)
Total runtime: 13.771 ms
(5 rows)
note:
在做完Vacuum后, heap fetches有原来的52k降为0. 说明不再做回表查询, 一个真正的index only scan(尽管visibility map总是扫描), cost与runtime也明显降低得到性能提升。
pg_visibility插件
postgres=# create extension pg_visibility;
CREATE EXTENSION
postgres=# select pg_visibility_map('testb'::regclass, 0);
pg_visibility_map
-------------------
(t,f)
(1 row)
NOTE:
pg_visibility插件提供了一种检查表的可见性映射 (VM) 和页面级可见性信息的方法。 它还提供了检查可见性地图完整性并强制重建它的功能。 这第1个“t”,表示true,意味着所有可见。
Vacuum对heap fetche在openGauss和PostgreSQL上的不同
测试均在一个空闲的数据库,只有测试1个session。
# openGAUSS 5
anbob=# \d+ testb
Table "public.testb"
Column | Type | Modifiers | Storage | Stats target | Description
--------+-----------------------+-----------+----------+--------------+-------------
id | integer | | plain | |
name | character varying(20) | | extended | |
Indexes:
"idx_testb_name" btree (name) TABLESPACE pg_default
Has OIDs: no
Options: orientation=row, compression=no
anbob=# update testb set id=id+1 where rownum<=1000;
UPDATE 1000
anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
[Bypass]
Index Only Scan using idx_testb_name on testb b (cost=0.00..1498.37 rows=48141 width=10) (actual time=0.068..12.791 rows=50000 loops=1)
Heap Fetches: 2028
(Buffers: shared hit=2194 dirtied=8)
Total runtime: 18.491 ms
(5 rows)
anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
[Bypass]
Index Only Scan using idx_testb_name on testb b (cost=0.00..1498.37 rows=48141 width=10) (actual time=0.019..10.592 rows=50000 loops=1)
Heap Fetches: 2028
(Buffers: shared hit=2194)
Total runtime: 15.612 ms
(5 rows)
anbob=# SELECT schemaname,relname,last_autovacuum,last_autoanalyze,autovacuum_count,autoanalyze_count,last_vacuum,vacuum_count FROM pg_stat_all_tables where relname='testb';
schemaname | relname | last_autovacuum | last_autoanalyze | autovacuum_count | autoanalyze_count | last_vacuum | vacuum_count
------------+---------+-----------------+-------------------------------+------------------+-------------------+-------------------------------+--------------
public | testb | | 2023-06-10 18:19:16.718103+08 | 0 | 1 | 2023-06-13 09:39:14.157991+08 | 18
(1 row)
Note:
做DML后,index only scan 有2028的heap fetches, 多次查询并不会清理。
anbob=# vacuum verbose testb; -- 第一次vacuum INFO: vacuuming "public.testb"(oel7db1 pid=1848) INFO: index "idx_testb_name" now contains 51000 row versions in 194 pages(oel7db1 pid=1848) DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "testb": found 0 removable, 2028 nonremovable row versions in 11 out of 276 pages(oel7db1 pid=1848) DETAIL: 1000 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ [Bypass] Index Only Scan using idx_testb_name on testb b (cost=0.00..1539.36 rows=48250 width=10) (actual time=0.119..14.011 rows=50000 loops=1) Heap Fetches: 2028 (Buffers: shared hit=2195 dirtied=49) Total runtime: 19.288 ms (5 rows) anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ [Bypass] Index Only Scan using idx_testb_name on testb b (cost=0.00..1539.36 rows=48250 width=10) (actual time=0.019..10.869 rows=50000 loops=1) Heap Fetches: 1028 (Buffers: shared hit=201) Total runtime: 16.209 ms (5 rows) anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ [Bypass] Index Only Scan using idx_testb_name on testb b (cost=0.00..1539.36 rows=48250 width=10) (actual time=0.020..11.415 rows=50000 loops=1) Heap Fetches: 1028 (Buffers: shared hit=201) Total runtime: 18.165 ms (5 rows) anbob=# select relkind,relname,reltuples,relpages from pg_class where relname='testb'; relkind | relname | reltuples | relpages ---------+---------+-----------+---------- r | testb | 48250 | 276 (1 row) anbob=# vacuum verbose testb; -- 第二次vacuum INFO: vacuuming "public.testb"(oel7db1 pid=1848) INFO: scanned index "idx_testb_name" to remove 1000 row versions(oel7db1 pid=1848) DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "idx_testb_name" now contains 50000 row versions in 194 pages(oel7db1 pid=1848) DETAIL: 1000 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "testb": found 0 removable, 1028 nonremovable row versions in 11 out of 276 pages(oel7db1 pid=1848) DETAIL: 0 dead row versions cannot be removed yet. There were 75 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuum 1663/16385/16490, "testb": removed 1000 row versions in 6 pages(oel7db1 pid=1848) DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- [Bypass] Index Only Scan using idx_testb_name on testb b (cost=0.00..1508.26 rows=47355 width=10) (actual time=0.034..8.768 rows=50000 loops=1) Heap Fetches: 103 (Buffers: shared hit=196) Total runtime: 14.308 ms (5 rows) anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ [Bypass] Index Only Scan using idx_testb_name on testb b (cost=0.00..1508.26 rows=47355 width=10) (actual time=0.017..10.182 rows=50000 loops=1) Heap Fetches: 103 (Buffers: shared hit=195) Total runtime: 16.030 ms (5 rows) anbob=# vacuum verbose testb; -- 第三次vacuum INFO: vacuuming "public.testb"(oel7db1 pid=1848) INFO: index "idx_testb_name" now contains 50000 row versions in 194 pages(oel7db1 pid=1848) DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "testb": found 0 removable, 103 nonremovable row versions in 6 out of 276 pages(oel7db1 pid=1848) DETAIL: 0 dead row versions cannot be removed yet. There were 1075 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- [Bypass] Index Only Scan using idx_testb_name on testb b (cost=0.00..1472.68 rows=46429 width=10) (actual time=0.027..9.967 rows=50000 loops=1) Heap Fetches: 0 (Buffers: shared hit=194) Total runtime: 15.442 ms (5 rows) anbob=# SELECT schemaname,relname,last_autovacuum,last_autoanalyze,autovacuum_count,autoanalyze_count,last_vacuum,vacuum_count FROM pg_stat_all_tables where relname='testb'; schemaname | relname | last_autovacuum | last_autoanalyze | autovacuum_count | autoanalyze_count | last_vacuum | vacuum_count ------------+---------+-----------------+-------------------------------+------------------+-------------------+-------------------------------+-------------- public | testb | | 2023-06-10 18:19:16.718103+08 | 0 | 1 | 2023-06-13 10:21:10.874028+08 | 21 (1 row) anbob=# anbob=# \! sh show vacuum autovacuum | on | Starts the autovacuum subprocess. autovacuum_analyze_scale_factor | 0.1 | Number of tuple inserts, updates, or deletes prior to analyze as a fraction of reltuples. autovacuum_analyze_threshold | 50 | Minimum number of tuple inserts, updates, or deletes prior to analyze. autovacuum_freeze_max_age | 4000000000 | Age at which to autovacuum a table. autovacuum_io_limits | -1 | Sets io_limit for autovacum. autovacuum_max_workers | 3 | Sets the maximum number of simultaneously running autovacuum worker processes. autovacuum_mode | mix | Sets the behavior of autovacuum autovacuum_naptime | 10min | Time to sleep between autovacuum runs. autovacuum_vacuum_cost_delay | 20ms | Vacuum cost delay in milliseconds, for autovacuum. autovacuum_vacuum_cost_limit | -1 | Vacuum cost amount available before napping, for autovacuum. autovacuum_vacuum_scale_factor | 0.2 | Number of tuple updates or deletes prior to vacuum as a fraction of reltuples. autovacuum_vacuum_threshold | 50 | Minimum number of tuple updates or deletes prior to vacuum. enable_debug_vacuum | off | This parameter is just used for logging some vacuum info. log_autovacuum_min_duration | -1 | Sets the minimum execution time above which autovacuum actions will be logged. vacuum_cost_delay | 0 | Vacuum cost delay in milliseconds. vacuum_cost_limit | 200 | Vacuum cost amount available before napping. vacuum_cost_page_dirty | 20 | Vacuum cost for a page dirtied by vacuum. vacuum_cost_page_hit | 1 | Vacuum cost for a page found in the buffer cache. vacuum_cost_page_miss | 10 | Vacuum cost for a page not found in the buffer cache. vacuum_defer_cleanup_age | 0 | Number of transactions by which VACUUM and HOT cleanup should be deferred, if any. vacuum_freeze_min_age | 2000000000 | Minimum age at which VACUUM should freeze a table row. vacuum_freeze_table_age | 4000000000 | Age at which VACUUM should scan whole table to freeze tuples. vacuum_gtt_defer_check_age | 10000 | The defer check age of GTT, used to check expired data after vacuum.
Note:
在openGauss中测试是做了3次vacuum才把heap fetches降到0. 暂时未找到原因,后期了解后再更新。(update – 2024/04- 朋友反馈vacuum analyze 1次就可以,你可以试试)
# PostgreSQL 13
anbob=#
select version();
version
------------------------------------------------------------
PostgreSQL 13.1, compiled by Visual C++ build 1914, 64-bit
(1 行记录)
anbob=# update testc set id=id+1 where id<10000;
UPDATE 19996
anbob=# explain analyze select name from testc;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using idx_testc_name on testc (cost=0.42..1769.59 rows=96178 width=10) (actual time=0.864..35.785 rows=110000 loops=1)
Heap Fetches: 40059
Planning Time: 1.617 ms
Execution Time: 38.599 ms
(4 行记录)
anbob=# vacuum verbose testc;
信息: 正在清理 (vacuum) "public.testc"
信息: 索引"idx_testc_name"在653个页中包含了行版本号108137
描述: 索引行版本0被删除.
0个索引页已经被删除,0当前可重用.
CPU:用户:0.00 s,系统:0.00 s,已用时间:0.00 s.
信息: "testc": 在超出1340页的1中找到可删除版本号0, 不可删除的版本号19
描述: 0的死亡行版本还不能被移除,最老的xmin: 808
有60个未用的项标识符.
由于缓冲区占用而跳过0个页面,204个冻结页面.
0 个页面完全为空。
CPU:用户:0.00 s,系统:0.00 s,已用时间:0.07 s
信息: 正在清理 (vacuum) "pg_toast.pg_toast_73976"
信息: 索引"pg_toast_73976_index"在1个页中包含了行版本号0
描述: 索引行版本0被删除.
0个索引页已经被删除,0当前可重用.
CPU:用户:0.00 s,系统:0.00 s,已用时间:0.00 s.
信息: "pg_toast_73976": 在超出0页的0中找到可删除版本号0, 不可删除的版本号0
描述: 0的死亡行版本还不能被移除,最老的xmin: 808
有0个未用的项标识符.
由于缓冲区占用而跳过0个页面,0个冻结页面.
0 个页面完全为空。
CPU:用户:0.00 s,系统:0.00 s,已用时间:0.00 s
VACUUM
anbob=# explain analyze select name from testc;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using idx_testc_name on testc (cost=0.42..1975.97 rows=109937 width=10) (actual time=0.516..13.016 rows=110000 loops=1)
Heap Fetches: 0
Planning Time: 80.679 ms
Execution Time: 15.232 ms
(4 行记录)
Note:
在PostgreSQL中一次vacuum就可以看到随后的index only scan的heap fetches就降为0.
小结:
在Postgresql系数据库中(如openGauss)index only scan并不总是仅索引, 需要扫描可见性映射文件(_vm)判断数据的可见性,如果有数据修改需要回表heap查询数据显示为heap fetches, 如果该值为0表示仅索引(和可见性映射文件)的扫描。 根据该表上发生的修改次数,PostgreSQL 很可能必须多次访问表heap,这当然会减慢SQL查询速度。 对于大多数块是静态的表,indexonlyscan非常好选择。
— 如果您发现有不对之处,请反馈给我。