Kingbase\Highgo(PostgreSQL)分区表的“全局”统计信息需要手动收集

基于postgresql的数据库如果依赖autovacuum收集统计信息时,会存在对于分区表,分区的统计信息会收集,但是表上的全局统计信息(global statistics)缺失,那样在查询跨越多个分区时或与其他表关连时,可能因统计信息错误产生错误的执行计划。

像表级统计信息对于总行数容易估算,为所有分区的总和。但是对于列上的distinct 值无法计算,因为无法得值分区间重复的记录。所以列的distinct应该是每个分区的最大不同值计数和所有分区计数的总和之间。

刚好看到Franck Pachot在分享postgreSQL的这个问题,我测试在Kingbase 和 HighgoDB上同样存在该问题,下面简单记录。

构建测试表

create table history (
   year int,
   num serial,
   x   int,
   y   int,
   primary key (year, num)
) partition by list (year)
;

create table history_2024 partition of history for values in (2024);
create table history_2025 partition of history for values in (2025);
create table history_2026 partition of history for values in (2026);
create table history_2027 partition of history for values in (2027);

insert into history select 
 extract(year from ( date '2026-01-02' - interval '1 minute' * num ))::int as year
 ,num             -- NDV ≈ rows (unique key, density ≈ 1 / rows)
 ,(num % 2) as x  -- NDV = 2 per partition (very high density: ~50% per value)  
 ,(num / 2) as y  -- NDV ≈ rows / 2 per partition (moderate density: ~2 rows per distinct value) 
 from generate_series(1,1e6) num
;

真实的数据分布情况:

select count(*), year
  , count(distinct x) as "distinct x"
  , min(x) as "min x"
  , max(x) as "max x"
  , count(*)::float / nullif(count(distinct x), 0) as density_x
  , count(distinct y) as "distinct y"
  , min(y) as "min y"
  , max(y) as "max y"
  , count(*)::float / nullif(count(distinct y), 0) as density_y
 from history group by grouping sets ((),year)
;

kingbase-# ;
  count  | year | distinct x | min x | max x | density_x | distinct y | min y  | max y  | density_y
---------+------+------------+-------+-------+-----------+------------+--------+--------+-----------
  472960 | 2024 |          2 |     0 |     1 |    236480 |     236480 | 263521 | 500000 |         2
  525600 | 2025 |          2 |     0 |     1 |    262800 |     262800 |    721 | 263520 |         2
    1440 | 2026 |          2 |     0 |     1 |       720 |        720 |      1 |    720 |         2
 1000000 |      |          2 |     0 |     1 |    500000 |     500000 |      1 | 500000 |         2
(4 rows)

insert完成后在kingbase 可能会有短暂(3s)没有统计信息,在highgodb发现autovacuum触发的比较频繁。

select relname, relpages, reltuples, relkind, relhassubclass, relispopulated, relispartition  
 from pg_class where relname like 'history%' order by relkind desc, relname
;
kingbase-# ;
      relname      | relpages | reltuples | relkind | relhassubclass | relispopulated | relispartition
-------------------+----------+-----------+---------+----------------+----------------+----------------
 history_2024      |        0 |        -1 | r       | f              | t              | t
 history_2025      |        0 |        -1 | r       | f              | t              | t
 history_2026      |        0 |        -1 | r       | f              | t              | t
 history_2027      |        0 |        -1 | r       | f              | t              | t
 history           |        0 |        -1 | p       | t              | t              | f
 history_2024_pkey |        1 |         0 | i       | f              | t              | t
 history_2025_pkey |        1 |         0 | i       | f              | t              | t
 history_2026_pkey |        1 |         0 | i       | f              | t              | t
 history_2027_pkey |        1 |         0 | i       | f              | t              | t
 history_num_seq   |        1 |         1 | S       | f              | t              | f
 history_pkey      |        0 |         0 | I       | t              | t              | f
(11 rows)


select tablename, attname, n_distinct, null_frac
 from pg_stats where tablename like 'history%'
kingbase-# ;
 tablename | attname | n_distinct | null_frac
-----------+---------+------------+-----------
(0 rows)

过一段时间,autovacuum开始工作,收集了分区级统计信息

kingbase=# select relname, relpages, reltuples, relkind, relhassubclass, relispopulated, relispartition
kingbase-#  from pg_class where relname like 'history%' order by relkind desc, relname
kingbase-# ;
      relname      | relpages | reltuples | relkind | relhassubclass | relispopulated | relispartition
-------------------+----------+-----------+---------+----------------+----------------+----------------
 history_2024      |     2571 |    472960 | r       | f              | t              | t
 history_2025      |     2857 |    525600 | r       | f              | t              | t
 history_2026      |        8 |      1440 | r       | f              | t              | t
 history_2027      |        0 |        -1 | r       | f              | t              | t
 history           |        0 |        -1 | p       | t              | t              | f
 history_2024_pkey |     1307 |    472960 | i       | f              | t              | t
 history_2025_pkey |     1451 |    525600 | i       | f              | t              | t
 history_2026_pkey |        6 |      1440 | i       | f              | t              | t
 history_2027_pkey |        1 |         0 | i       | f              | t              | t
 history_num_seq   |        1 |         1 | S       | f              | t              | f
 history_pkey      |        0 |         0 | I       | t              | t              | f

kingbase=# select tablename, attname, n_distinct, null_frac
kingbase-#  from pg_stats where tablename like 'history%'
kingbase-# ;
  tablename   | attname | n_distinct  | null_frac
--------------+---------+-------------+-----------
 history_2024 | year    |           1 |         0
 history_2024 | num     |          -1 |         0
 history_2024 | x       |           2 |         0
 history_2024 | y       |  -0.5069985 |         0
 history_2025 | year    |           1 |         0
 history_2025 | num     |          -1 |         0
 history_2025 | x       |           2 |         0
 history_2025 | y       | -0.49855024 |         0
 history_2026 | year    |           1 |         0
 history_2026 | num     |          -1 |         0
 history_2026 | x       |           2 |         0
 history_2026 | y       |        -0.5 |         0
(12 rows)

注: 但是没有全局统计信息,列统计信息也仅有分区级,包含了distinct 值数量。

模拟跨分区时的估算

explain (analyze, buffers off, summary off)  select distinct x from history
kingbase-# ;
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=22978.20..22980.20 rows=200 width=4) (actual time=313.883..313.885 rows=2 loops=1)
   Group Key: history_2024.x
   ->  Append  (cost=0.00..20473.60 rows=1001840 width=4) (actual time=0.013..178.975 rows=1000000 loops=1)
         ->  Seq Scan on history_2024  (cost=0.00..7300.60 rows=472960 width=4) (actual time=0.012..56.130 rows=472960 loops=1)
         ->  Seq Scan on history_2025  (cost=0.00..8113.00 rows=525600 width=4) (actual time=0.012..62.377 rows=525600 loops=1)
         ->  Seq Scan on history_2026  (cost=0.00..22.40 rows=1440 width=4) (actual time=0.012..0.179 rows=1440 loops=1)
         ->  Seq Scan on history_2027  (cost=0.00..28.40 rows=1840 width=4) (actual time=0.001..0.002 rows=0 loops=1)
(7 rows)

explain (analyze, buffers off, summary off) select distinct y from history
kingbase-# ;
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=22978.20..22980.20 rows=200 width=4) (actual time=449.150..552.624 rows=500000 loops=1)
   Group Key: history_2024.y
   ->  Append  (cost=0.00..20473.60 rows=1001840 width=4) (actual time=0.011..191.240 rows=1000000 loops=1)
         ->  Seq Scan on history_2024  (cost=0.00..7300.60 rows=472960 width=4) (actual time=0.010..60.868 rows=472960 loops=1)
         ->  Seq Scan on history_2025  (cost=0.00..8113.00 rows=525600 width=4) (actual time=0.015..68.103 rows=525600 loops=1)
         ->  Seq Scan on history_2026  (cost=0.00..22.40 rows=1440 width=4) (actual time=0.014..0.195 rows=1440 loops=1)
         ->  Seq Scan on history_2027  (cost=0.00..28.40 rows=1840 width=4) (actual time=0.003..0.003 rows=0 loops=1)
(7 rows)

Note: 估算的rows 是200,实际x列就2个值。y列估算也是200,实际是500000. 可见Postgresql在没有统计信息时使用默认值200. PG源码中可以找到 define DEFAULT_NUM_DISTINCT 200.

查询单分区时的估算

kingbase=# SELECT * FROM pg_partition_tree('history');
    relid     | parentrelid | isleaf | level
--------------+-------------+--------+-------
 history      |             | f      |     0
 history_2024 | history     | t      |     1
 history_2025 | history     | t      |     1
 history_2026 | history     | t      |     1
 history_2027 | history     | t      |     1
(5 rows)

kingbase=# explain (analyze, buffers off, summary off) select distinct x from history_2024;
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=8483.00..8483.02 rows=2 width=4) (actual time=114.068..114.069 rows=2 loops=1)
   Group Key: x
   ->  Seq Scan on history_2024  (cost=0.00..7300.60 rows=472960 width=4) (actual time=0.012..37.173 rows=472960 loops=1)
(3 rows)

Note: 单分区估算正常.

当多表关连时跨多分区可能会产生错误的执行计划,因为使用rows 200。 Autovacuum 自动分析功能会收集分区统计信息,但不会收集父分区的统计信息。要收集父分区的全局统计信息,您必须手动运行analyze命令。 默认analyze 会递归所有分区, (在Postgresql 16中引入了指定因子n_distinct_inherited,使用analyze only 仅分析父分区(不递归),目前国产数据库基于的PG普遍在12或14还没有该功能。)

kingbase=# ANALYZE history
kingbase-# ;
ANALYZE
kingbase=# select relname, relpages, reltuples, relkind, relhassubclass, relispopulated, relispartition
kingbase-#  from pg_class where relname like 'history%' order by relkind desc, relname
kingbase-# ;
      relname      | relpages | reltuples | relkind | relhassubclass | relispopulated | relispartition
-------------------+----------+-----------+---------+----------------+----------------+----------------
 history_2024      |     2571 |    472960 | r       | f              | t              | t
 history_2025      |     2857 |    525600 | r       | f              | t              | t
 history_2026      |        8 |      1440 | r       | f              | t              | t
 history_2027      |        0 |         0 | r       | f              | t              | t
 history           |     5436 |     1e+06 | p       | t              | t              | f
 history_2024_pkey |     1307 |    472960 | i       | f              | t              | t
 history_2025_pkey |     1451 |    525600 | i       | f              | t              | t
 history_2026_pkey |        6 |      1440 | i       | f              | t              | t
 history_2027_pkey |        1 |         0 | i       | f              | t              | t
 history_num_seq   |        1 |         1 | S       | f              | t              | f
 history_pkey      |     2765 |     1e+06 | I       | t              | t              | f
(11 rows)

select tablename, attname, n_distinct, null_frac
 from pg_stats where tablename like 'history%'
kingbase-# ;
  tablename   | attname | n_distinct  | null_frac
--------------+---------+-------------+-----------
 history      | year    |           3 |         0
 history      | num     |          -1 |         0
 history      | x       |           2 |         0
 history      | y       |   -0.476323 |         0
 history_2024 | year    |           1 |         0
 history_2024 | num     |          -1 |         0
 history_2024 | x       |           2 |         0
 history_2024 | y       | -0.49474797 |         0
 history_2025 | year    |           1 |         0
 history_2025 | num     |          -1 |         0
 history_2025 | x       |           2 |         0
 history_2026 | year    |           1 |         0
 history_2026 | x       |           2 |         0
 history_2025 | y       | -0.49974886 |         0
 history_2026 | num     |          -1 |         0
 history_2026 | y       |        -0.5 |         0
(16 rows)

再次执行同样的查询

kingbase=# explain (analyze, buffers off, summary off) select distinct x from history;
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=22936.01..22936.03 rows=2 width=4) (actual time=313.243..313.245 rows=2 loops=1)
   Group Key: history_2024.x
   ->  Append  (cost=0.00..20436.01 rows=1000001 width=4) (actual time=0.033..177.602 rows=1000000 loops=1)
         ->  Seq Scan on history_2024  (cost=0.00..7300.60 rows=472960 width=4) (actual time=0.032..55.660 rows=472960 loops=1)
         ->  Seq Scan on history_2025  (cost=0.00..8113.00 rows=525600 width=4) (actual time=0.017..61.183 rows=525600 loops=1)
         ->  Seq Scan on history_2026  (cost=0.00..22.40 rows=1440 width=4) (actual time=0.011..0.180 rows=1440 loops=1)
         ->  Seq Scan on history_2027  (cost=0.00..0.00 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=1)
(7 rows)

kingbase=# explain (analyze, buffers off, summary off) select distinct y from history;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=133768.45..138768.46 rows=476323 width=4) (actual time=320.937..529.077 rows=500000 loops=1)
   ->  Sort  (cost=133768.45..136268.46 rows=1000001 width=4) (actual time=320.935..417.324 rows=1000000 loops=1)
         Sort Key: history_2024.y
         Sort Method: external merge  Disk: 13800kB
         ->  Append  (cost=0.00..20436.01 rows=1000001 width=4) (actual time=0.011..180.121 rows=1000000 loops=1)
               ->  Seq Scan on history_2024  (cost=0.00..7300.60 rows=472960 width=4) (actual time=0.010..56.977 rows=472960 loops=1)
               ->  Seq Scan on history_2025  (cost=0.00..8113.00 rows=525600 width=4) (actual time=0.013..62.874 rows=525600 loops=1)
               ->  Seq Scan on history_2026  (cost=0.00..22.40 rows=1440 width=4) (actual time=0.012..0.181 rows=1440 loops=1)
               ->  Seq Scan on history_2027  (cost=0.00..0.00 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1)
(9 rows)

Note: 现在的查询优化器给给出几乎接近实际的rows。

autovacuum当然也是在达到收集的触发条件时才收集了分区级的统计信息,但是父表是空表所以缺失全局统计信息,所以需要手动的维护统计信息。

— over —

Leave a Comment