前几天写过一篇<MySQL 多表关连时优化器缺陷出现的错误Join order>,记录了mysql在多表join时,在清理了其中一个表部分记录后,优化器因为局部的cost估算产生了错误的join order,后来和同事讨论,这里显示MySQL的另一个问题,MySQL的统计信息来自索引,无索引列在MySQL 8.0前或在8.0后没有手动创建列直方图(Histogram),优化器是无法准确估算选择性的。下面我们继续上一篇的测试用例继续。
MySQL有列的统计信息吗?
MySQL 在一定程度上 提供了列(column)上的统计信息,但其详细程度和使用方式取决于所使用的存储引擎(尤其是 InnoDB)以及 MySQL 的版本。
从 MySQL 5.6 开始,InnoDB 引入了持久化优化器统计信息(Persistent Optimizer Statistics),其中包括表和索引的统计信息。虽然 InnoDB 不直接为每一列维护详细的统计信息(如直方图、唯一值数量等),但它会通过索引来间接提供一些列的分布信息。从 MySQL 8.0 开始,MySQL 原生支持列上的直方图统计信息,用于帮助优化器更准确地估算查询条件的选择性(特别是对非索引列或复杂谓词)。
| MySQL 版本 | 列统计信息支持情况 |
|---|---|
| < 5.6 | 几乎没有列级别统计,依赖运行时采样 |
| 5.6 ~ 5.7 | 有索引统计,但无列直方图 |
| 8.0+ | ✅ 支持通过 ANALYZE TABLE ... UPDATE HISTOGRAM 创建列直方图 |
统计信息相关视图
可通过 INFORMATION_SCHEMA.STATISTICS 和 mysql.innodb_index_stats 查看索引相关的统计。
SELECT * FROM mysql.innodb_table_stats;
SELECT * FROM mysql.innodb_index_stats;
列上的直方图统计信息
- 使用
ANALYZE TABLE table_name UPDATE HISTOGRAM ON column_name [WITH N BUCKETS];创建直方图。
直方图存储在 数据字典表 information_schema.column_statistics 中,只有在存在直方图时才会返回行。
案例测试
我们现在知道了mysql是基于索引的统计信息,那我们创建join 列的索引,帮助优化器更准确地估算查询条件的选择性
mysql> use anbob;
Database changed
mysql> select count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;
+----------+
| count(*) |
+----------+
| 1000 |
+----------+
1 row in set (0.84 sec)
mysql> explain select count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;
| -> Aggregate: count(0) (cost=3.73e+9 rows=1)
-> Inner hash join (t2.`name` = t3.`name`), (t2.id = t1.id) (cost=3.7e+9 rows=370e+6)
-> Table scan on t2 (cost=0.00437 rows=19453)
-> Hash
-> Inner hash join (no condition) (cost=190103 rows=1.9e+6)
-> Table scan on t3 (cost=0.192 rows=1900)
-> Hash
-> Table scan on t1 (cost=101 rows=1000)
|
mysql> select * from information_schema.statistics where table_name='T3';
Empty set (0.01 sec)
mysql> select * from mysql.innodb_table_stats where table_name='t3';
+---------------+------------+---------------------+--------+----------------------+--------------------------+
| database_name | table_name | last_update | n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+------------+---------------------+--------+----------------------+--------------------------+
| anbob | t3 | 2025-12-26 08:55:24 | 1900 | 7 | 0 |
+---------------+------------+---------------------+--------+----------------------+--------------------------+
1 row in set (0.00 sec)
mysql> select * from mysql.innodb_index_stats where table_name='t3';
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| anbob | t3 | GEN_CLUST_INDEX | 2025-12-26 08:55:24 | n_diff_pfx01 | 1900 | 6 | DB_ROW_ID |
| anbob | t3 | GEN_CLUST_INDEX | 2025-12-26 08:55:24 | n_leaf_pages | 6 | NULL | Number of leaf pages in the index |
| anbob | t3 | GEN_CLUST_INDEX | 2025-12-26 08:55:24 | size | 7 | NULL | Number of pages in the index |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
3 rows in set (0.00 sec)
mysql> create index idx_t3_name on t3(name);
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> select * from information_schema.column_statistics where table_name='t3';
Empty set (0.00 sec)
mysql> select * from mysql.innodb_index_stats where table_name='t3';
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| anbob | t3 | GEN_CLUST_INDEX | 2026-01-07 09:09:34 | n_diff_pfx01 | 1900 | 6 | DB_ROW_ID |
| anbob | t3 | GEN_CLUST_INDEX | 2026-01-07 09:09:34 | n_leaf_pages | 6 | NULL | Number of leaf pages in the index |
| anbob | t3 | GEN_CLUST_INDEX | 2026-01-07 09:09:34 | size | 7 | NULL | Number of pages in the index |
| anbob | t3 | idx_t3_name | 2026-01-07 09:09:34 | n_diff_pfx01 | 1900 | 3 | name |
| anbob | t3 | idx_t3_name | 2026-01-07 09:09:34 | n_diff_pfx02 | 1900 | 3 | name,DB_ROW_ID |
| anbob | t3 | idx_t3_name | 2026-01-07 09:09:34 | n_leaf_pages | 3 | NULL | Number of leaf pages in the index |
| anbob | t3 | idx_t3_name | 2026-01-07 09:09:34 | size | 4 | NULL | Number of pages in the index |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
7 rows in set (0.00 sec)
ysql> select * from information_schema.statistics where table_name='t3';
+---------------+--------------+------------+------------+--------------+-------------+--------------+-------------+-----------+-------------+----------+--------+----------+------------+---------+---------------+------------+------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | NON_UNIQUE | INDEX_SCHEMA | INDEX_NAME | SEQ_IN_INDEX | COLUMN_NAME | COLLATION | CARDINALITY | SUB_PART | PACKED | NULLABLE | INDEX_TYPE | COMMENT | INDEX_COMMENT | IS_VISIBLE | EXPRESSION |
+---------------+--------------+------------+------------+--------------+-------------+--------------+-------------+-----------+-------------+----------+--------+----------+------------+---------+---------------+------------+------------+
| def | anbob | t3 | 1 | anbob | idx_t3_name | 1 | name | A | 1900 | NULL | NULL | YES | BTREE | | | YES | NULL |
+---------------+--------------+------------+------------+--------------+-------------+--------------+-------------+-----------+-------------+----------+--------+----------+------------+---------+---------------+------------+------------+
1 row in set (0.00 sec)
mysql> select count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;
+----------+
| count(*) |
+----------+
| 1000 |
+----------+
1 row in set (0.01 sec)
mysql> explain format=tree select count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;
-------+
| -> Aggregate: count(0) (cost=1.69e+6 rows=1)
-> Nested loop inner join (cost=1.67e+6 rows=194530)
-> Inner hash join (t2.id = t1.id) (cost=194656 rows=194530)
-> Filter: (t2.`name` is not null) (cost=0.219 rows=1945)
-> Table scan on t2 (cost=0.219 rows=19453)
-> Hash
-> Table scan on t1 (cost=101 rows=1000)
-> Covering index lookup on t3 using idx_t3_name (name=t2.`name`) (cost=0.75 rows=1)
|
mysql> explain format=tree select count(*) from t1,t2,t3 ignore index (idx_t3_name) where t1.id=t2.id and t2.name=t3.name;
-------------------------------------+
| -> Aggregate: count(0) (cost=372e+6 rows=1)
-> Inner hash join (t3.`name` = t2.`name`) (cost=372e+6 rows=1.95e+6)
-> Table scan on t3 (cost=341e-6 rows=1900)
-> Hash
-> Inner hash join (t2.id = t1.id) (cost=1.95e+6 rows=1.95e+6)
-> Table scan on t2 (cost=0.219 rows=19453)
-> Hash
-> Table scan on t1 (cost=101 rows=1000)
|
Note: 创建了索引列后,就可以更准确的执行计划。即使不使用创建的索引,索引列的统计信息也可以帮助修正join order .
创建列直方图
删除了T3表name列索引,增加列直方图试试
mysql> alter table t3 drop index idx_t3_name;
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> analyze table t3 update histogram on name;
+----------+-----------+----------+-------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+----------+-----------+----------+-------------------------------------------------+
| anbob.t3 | histogram | status | Histogram statistics created for column 'name'. |
+----------+-----------+----------+-------------------------------------------------+
mysql> SELECT schema_name,table_name,column_name,substr(HISTOGRAM,1,50) FROM information_schema.column_statistics where table_name='t3';
+-------------+------------+-------------+----------------------------------------------------+
| SCHEMA_NAME | TABLE_NAME | COLUMN_NAME | substr(HISTOGRAM,1,50) |
+-------------+------------+-------------+----------------------------------------------------+
| anbob | t3 | name | {"buckets": [["base64:type254:dXNlcjE=", "base64:t |
+-------------+------------+-------------+----------------------------------------------------+
mysql> explain format=tree select count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;
| -> Aggregate: count(0) (cost=3.73e+9 rows=1)
-> Inner hash join (t2.`name` = t3.`name`), (t2.id = t1.id) (cost=3.7e+9 rows=370e+6)
-> Table scan on t2 (cost=0.00437 rows=19453)
-> Hash
-> Inner hash join (no condition) (cost=190103 rows=1.9e+6)
-> Table scan on t3 (cost=0.192 rows=1900)
-> Hash
-> Table scan on t1 (cost=101 rows=1000)
|
mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.36 |
+-----------+
Note: 创建列的直方图对我们这个join 顺序并没有帮助。
— over —