MySQL 多表关连时优化器缺陷出现的错误Join order(续)

前几天写过一篇<MySQL 多表关连时优化器缺陷出现的错误Join order>,记录了mysql在多表join时,在清理了其中一个表部分记录后,优化器因为局部的cost估算产生了错误的join order,后来和同事讨论,这里显示MySQL的另一个问题,MySQL的统计信息来自索引,无索引列在MySQL 8.0前或在8.0后没有手动创建列直方图(Histogram),优化器是无法准确估算选择性的。下面我们继续上一篇的测试用例继续。

MySQL有列的统计信息吗?

MySQL 在一定程度上 提供了列(column)上的统计信息,但其详细程度和使用方式取决于所使用的存储引擎(尤其是 InnoDB)以及 MySQL 的版本。

MySQL 5.6 开始,InnoDB 引入了持久化优化器统计信息(Persistent Optimizer Statistics),其中包括表和索引的统计信息。虽然 InnoDB 不直接为每一列维护详细的统计信息(如直方图、唯一值数量等),但它会通过索引来间接提供一些列的分布信息。从 MySQL 8.0 开始,MySQL 原生支持列上的直方图统计信息,用于帮助优化器更准确地估算查询条件的选择性(特别是对非索引列或复杂谓词)。

MySQL 版本列统计信息支持情况
< 5.6几乎没有列级别统计,依赖运行时采样
5.6 ~ 5.7有索引统计,但无列直方图
8.0+✅ 支持通过 ANALYZE TABLE ... UPDATE HISTOGRAM 创建列直方图

统计信息相关视图

可通过 INFORMATION_SCHEMA.STATISTICSmysql.innodb_index_stats 查看索引相关的统计。

SELECT * FROM mysql.innodb_table_stats;

SELECT * FROM mysql.innodb_index_stats;

列上的直方图统计信息

  • 使用 ANALYZE TABLE table_name UPDATE HISTOGRAM ON column_name [WITH N BUCKETS]; 创建直方图。

直方图存储在 数据字典表 information_schema.column_statistics 中,只有在存在直方图时才会返回行。

案例测试

我们现在知道了mysql是基于索引的统计信息,那我们创建join 列的索引,帮助优化器更准确地估算查询条件的选择性

mysql> use anbob;
Database changed
mysql> select count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;
+----------+
| count(*) |
+----------+
|     1000 |
+----------+
1 row in set (0.84 sec)

mysql> explain select count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;

| -> Aggregate: count(0)  (cost=3.73e+9 rows=1)
    -> Inner hash join (t2.`name` = t3.`name`), (t2.id = t1.id)  (cost=3.7e+9 rows=370e+6)
        -> Table scan on t2  (cost=0.00437 rows=19453)
        -> Hash
            -> Inner hash join (no condition)  (cost=190103 rows=1.9e+6)
                -> Table scan on t3  (cost=0.192 rows=1900)
                -> Hash
                    -> Table scan on t1  (cost=101 rows=1000)
 |

mysql> select * from information_schema.statistics where table_name='T3';
Empty set (0.01 sec)

mysql> select * from mysql.innodb_table_stats where table_name='t3';
+---------------+------------+---------------------+--------+----------------------+--------------------------+
| database_name | table_name | last_update         | n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+------------+---------------------+--------+----------------------+--------------------------+
| anbob         | t3         | 2025-12-26 08:55:24 |   1900 |                    7 |                        0 |
+---------------+------------+---------------------+--------+----------------------+--------------------------+
1 row in set (0.00 sec)

mysql> select * from mysql.innodb_index_stats  where table_name='t3';
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name      | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| anbob         | t3         | GEN_CLUST_INDEX | 2025-12-26 08:55:24 | n_diff_pfx01 |       1900 |           6 | DB_ROW_ID                         |
| anbob         | t3         | GEN_CLUST_INDEX | 2025-12-26 08:55:24 | n_leaf_pages |          6 |        NULL | Number of leaf pages in the index |
| anbob         | t3         | GEN_CLUST_INDEX | 2025-12-26 08:55:24 | size         |          7 |        NULL | Number of pages in the index      |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
3 rows in set (0.00 sec)

mysql> create index idx_t3_name on t3(name);
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> select * from information_schema.column_statistics where table_name='t3';
Empty set (0.00 sec)

mysql> select * from mysql.innodb_index_stats  where table_name='t3';
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name      | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| anbob         | t3         | GEN_CLUST_INDEX | 2026-01-07 09:09:34 | n_diff_pfx01 |       1900 |           6 | DB_ROW_ID                         |
| anbob         | t3         | GEN_CLUST_INDEX | 2026-01-07 09:09:34 | n_leaf_pages |          6 |        NULL | Number of leaf pages in the index |
| anbob         | t3         | GEN_CLUST_INDEX | 2026-01-07 09:09:34 | size         |          7 |        NULL | Number of pages in the index      |
| anbob         | t3         | idx_t3_name     | 2026-01-07 09:09:34 | n_diff_pfx01 |       1900 |           3 | name                              |
| anbob         | t3         | idx_t3_name     | 2026-01-07 09:09:34 | n_diff_pfx02 |       1900 |           3 | name,DB_ROW_ID                    |
| anbob         | t3         | idx_t3_name     | 2026-01-07 09:09:34 | n_leaf_pages |          3 |        NULL | Number of leaf pages in the index |
| anbob         | t3         | idx_t3_name     | 2026-01-07 09:09:34 | size         |          4 |        NULL | Number of pages in the index      |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
7 rows in set (0.00 sec)

ysql> select * from information_schema.statistics where table_name='t3';
+---------------+--------------+------------+------------+--------------+-------------+--------------+-------------+-----------+-------------+----------+--------+----------+------------+---------+---------------+------------+------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | NON_UNIQUE | INDEX_SCHEMA | INDEX_NAME  | SEQ_IN_INDEX | COLUMN_NAME | COLLATION | CARDINALITY | SUB_PART | PACKED | NULLABLE | INDEX_TYPE | COMMENT | INDEX_COMMENT | IS_VISIBLE | EXPRESSION |
+---------------+--------------+------------+------------+--------------+-------------+--------------+-------------+-----------+-------------+----------+--------+----------+------------+---------+---------------+------------+------------+
| def           | anbob        | t3         |          1 | anbob        | idx_t3_name |            1 | name        | A         |        1900 |     NULL |   NULL | YES      | BTREE      |         |               | YES        | NULL       |
+---------------+--------------+------------+------------+--------------+-------------+--------------+-------------+-----------+-------------+----------+--------+----------+------------+---------+---------------+------------+------------+
1 row in set (0.00 sec)

mysql>  select count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;
+----------+
| count(*) |
+----------+
|     1000 |
+----------+
1 row in set (0.01 sec)

mysql> explain format=tree select count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;

-------+
| -> Aggregate: count(0)  (cost=1.69e+6 rows=1)
    -> Nested loop inner join  (cost=1.67e+6 rows=194530)
        -> Inner hash join (t2.id = t1.id)  (cost=194656 rows=194530)
            -> Filter: (t2.`name` is not null)  (cost=0.219 rows=1945)
                -> Table scan on t2  (cost=0.219 rows=19453)
            -> Hash
                -> Table scan on t1  (cost=101 rows=1000)
        -> Covering index lookup on t3 using idx_t3_name (name=t2.`name`)  (cost=0.75 rows=1)
 |

mysql> explain format=tree select count(*) from t1,t2,t3 ignore index (idx_t3_name) where t1.id=t2.id and t2.name=t3.name;

-------------------------------------+
| -> Aggregate: count(0)  (cost=372e+6 rows=1)
    -> Inner hash join (t3.`name` = t2.`name`)  (cost=372e+6 rows=1.95e+6)
        -> Table scan on t3  (cost=341e-6 rows=1900)
        -> Hash
            -> Inner hash join (t2.id = t1.id)  (cost=1.95e+6 rows=1.95e+6)
                -> Table scan on t2  (cost=0.219 rows=19453)
                -> Hash
                    -> Table scan on t1  (cost=101 rows=1000)
 |

Note: 创建了索引列后,就可以更准确的执行计划。即使不使用创建的索引,索引列的统计信息也可以帮助修正join order .

创建列直方图

删除了T3表name列索引,增加列直方图试试

mysql> alter table t3 drop index idx_t3_name;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> analyze table t3 update histogram on name;
+----------+-----------+----------+-------------------------------------------------+
| Table    | Op        | Msg_type | Msg_text                                        |
+----------+-----------+----------+-------------------------------------------------+
| anbob.t3 | histogram | status   | Histogram statistics created for column 'name'. |
+----------+-----------+----------+-------------------------------------------------+

mysql> SELECT schema_name,table_name,column_name,substr(HISTOGRAM,1,50) FROM information_schema.column_statistics where table_name='t3';
+-------------+------------+-------------+----------------------------------------------------+
| SCHEMA_NAME | TABLE_NAME | COLUMN_NAME | substr(HISTOGRAM,1,50)                             |
+-------------+------------+-------------+----------------------------------------------------+
| anbob       | t3         | name        | {"buckets": [["base64:type254:dXNlcjE=", "base64:t |
+-------------+------------+-------------+----------------------------------------------------+

mysql> explain format=tree select count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;
| -> Aggregate: count(0)  (cost=3.73e+9 rows=1)
    -> Inner hash join (t2.`name` = t3.`name`), (t2.id = t1.id)  (cost=3.7e+9 rows=370e+6)
        -> Table scan on t2  (cost=0.00437 rows=19453)
        -> Hash
            -> Inner hash join (no condition)  (cost=190103 rows=1.9e+6)
                -> Table scan on t3  (cost=0.192 rows=1900)
                -> Hash
                    -> Table scan on t1  (cost=101 rows=1000)
 |

mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.36    |
+-----------+

Note: 创建列的直方图对我们这个join 顺序并没有帮助。

— over —

Leave a Comment