MySQL 多表关连时优化器缺陷出现的错误Join order

众所周知,在多表关连SQL上MySQL 优化器相对其他如oracle,postgreSQL有一些缺陷,今天看到了一个现象开始Join order是正常的在某张表做了delete后,产生了错误的cost, 改变了join 顺序,而产生了笛卡尔积,导致SQL性能下降,简单记录。

创建表三线测试表

mysql> create table t1(id int,name varchar(10));
Query OK, 0 rows affected (0.01 sec)

mysql> create table t2(id int,name varchar(10));
Query OK, 0 rows affected (0.01 sec)

mysql> create table t3(id int,name varchar(10));
Query OK, 0 rows affected (0.01 sec)

批量加载一些数据

这里我使用了java,你也可以写plsql 或其他方法

import java.sql.*;
import java.util.Random;
import java.time.LocalDateTime;

public class loaddata {
    
    private static final String URL = "jdbc:mysql://localhost:3306/anbob";
    private static final String USER = "root";
    private static final String PASSWORD = "root";
    
    public static void main(String[] args) {
        Connection conn = null;
        PreparedStatement pstmt = null;
        
        try {
            // 1. 加载驱动
            Class.forName("com.mysql.cj.jdbc.Driver");
            
            // 2. 建立连接
            conn = DriverManager.getConnection(URL, USER, PASSWORD);
            
            // 3. 关闭自动提交
            conn.setAutoCommit(false);
            
            // 4. 准备SQL
            String sql = "INSERT INTO t3 (id,name) VALUES (?, ?)";
            pstmt = conn.prepareStatement(sql);
            
            // 5. 设置参数
            Random random = new Random();
            LocalDateTime now = LocalDateTime.now();
            
            for (int i = 1; i <= 3000; i++) {
                pstmt.setInt(1,i);
                pstmt.setString(2, "user" + i);
                  
                // 添加到批处理
                pstmt.addBatch();
                
                // 每100条执行一次批处理,避免内存溢出
                if (i % 100 == 0) {
                    pstmt.executeBatch();
                    conn.commit();
                    System.out.println("已插入 " + i + " 条记录");
                }
            }
            
            // 执行剩余批处理
            pstmt.executeBatch();
            conn.commit();
            
            System.out.println("批量插入完成,总共插入1000条记录");
            
        } catch (Exception e) {
            e.printStackTrace();
            try {
                if (conn != null) {
                    conn.rollback();
                }
            } catch (SQLException ex) {
                ex.printStackTrace();
            }
        } finally {
            // 关闭资源
            try {
                if (pstmt != null) pstmt.close();
                if (conn != null) conn.close();
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }
    }
}


D:\code\formysql>javac loaddata.java

D:\code\formysql>java loaddata
java.lang.ClassNotFoundException: com.mysql.cj.jdbc.Driver
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
        at java.base/java.lang.Class.forName0(Native Method)
        at java.base/java.lang.Class.forName(Class.java:391)
        at java.base/java.lang.Class.forName(Class.java:382)
        at loaddata.main(loaddata.java:17)

D:\code\formysql>SET CLASSPATH=.;D:\mysql-connector-j-9.5.0\mysql-connector-j-9.5.0.jar

D:\code\formysql>java loaddata
已插入 100 条记录
已插入 200 条记录
...

验证多表Join

mysql> select count(*) from t1;
+----------+
| count(*) |
+----------+
|     1000 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from t2;
+----------+
| count(*) |
+----------+
|    20000 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from t3;
+----------+
| count(*) |
+----------+
|     3000 |
+----------+
1 row in set (0.00 sec)



mysql> select count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;
+----------+
| count(*) |
+----------+
|     1000 |
+----------+
1 row in set (0.01 sec)

mysql> explain format=tree select count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;
+------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                              |
+------------------------------------------------------------------------------------------------------+
| -> Aggregate: count(0)  (cost=644e+6 rows=1)                                                     
    -> Inner hash join (t3.`name` = t2.`name`)  (cost=586e+6 rows=584e+6)                          
        -> Table scan on t3  (cost=503e-6 rows=3000)                                               
        -> Hash                                                                                    
            -> Inner hash join (t2.id = t1.id)  (cost=1.95e+6 rows=1.95e+6)                        
                -> Table scan on t2  (cost=0.219 rows=19453)                                       
                -> Hash                                                                            
                    -> Table scan on t1  (cost=101 rows=1000)                                      
 |                                                                                                 
+------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)   

Note: 使用使用的hash join, 顺序 t2 -> t1 -> t3. 算是理想状态,用cost/rows可以算出每行cost为0.1.

下面清掉T3近2/3数据

mysql> delete from t3 where t3.id>1900;
Query OK, 1100 rows affected (0.01 sec)

mysql> commit;                                                                                                                        
Query OK, 0 rows affected (0.00 sec)       


mysql> select count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;                                 
+----------+                                                                                                
| count(*) |                                                                                                                          
+----------+                                                                                                                          
|     1000 |                                                                                                                          
+----------+                                                                                                                          
1 row in set (0.80 sec)                                                                                                
																											
mysql> explain format=tree select count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;             
+-----------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                         |
+-----------------------------------------------------------------------------------------------------------------+
| -> Aggregate: count(0)  (cost=3.73e+9 rows=1)                                                             
    -> Inner hash join (t2.`name` = t3.`name`), (t2.id = t1.id)  (cost=3.7e+9 rows=370e+6)                  
        -> Table scan on t2  (cost=0.00437 rows=19453)                                                      
        -> Hash                                                                                             
            -> Inner hash join (no condition)  (cost=190103 rows=1.9e+6)                                    
                -> Table scan on t3  (cost=0.192 rows=1900)                                                 
                -> Hash                                                                                     
                    -> Table scan on t1  (cost=101 rows=1000)                                               
 |                                                                                                          
+-----------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec) 

Note: 这里join order改变了,还是hash join, t3-> t1 -> t2, 而且t3和t1 使用笛卡尔积。

收集统计信息

mysql> analyze table t3;
+----------+---------+----------+----------+
| Table    | Op      | Msg_type | Msg_text |
+----------+---------+----------+----------+
| anbob.t3 | analyze | status   | OK       |
+----------+---------+----------+----------+
1 row in set (0.00 sec)

mysql> show table status like "t3";
+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation          | Checksum | Create_options | Comment |
+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+
| t3   | InnoDB |      10 | Dynamic    | 1900 |             60 |      114688 |               0 |            0 |         0 |           NULL | 2025-12-26 08:08:35 | 2025-12-26 08:48:20 | NULL       | utf8mb4_0900_ai_ci |     NULL |                |         |
+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+--------------------+----------+----------------+---------+
1 row in set (0.00 sec)

Note: 分析后执行计划并示改变。

使用hint join_order 强制修改

mysql> select /*+join_order(t1,t2,t3)*/ count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;                                 
+----------+                                                                                                                          
| count(*) |                                                                                                                          
+----------+                                                                                                                          
|     1000 |                                                                                                                          
+----------+                                                                                                                          
1 row in set (0.01 sec)                                                                                                               
																																	  
mysql> explain format=tree select /*+join_order(t1,t2,t3)*/ count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;             
+----------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+
| -> Aggregate: count(0)  (cost=409e+6 rows=1)                                                                                        
    -> Inner hash join (t3.`name` = t2.`name`)  (cost=372e+6 rows=370e+6)                                                             
        -> Table scan on t3  (cost=351e-6 rows=1900)                                                                                  
        -> Hash                                                                                                                       
            -> Inner hash join (t2.id = t1.id)  (cost=1.95e+6 rows=1.95e+6)                                                           
                -> Table scan on t2  (cost=0.219 rows=19453)                                                                          
                -> Hash                                                                                                               
                    -> Table scan on t1  (cost=101 rows=1000)                                                                         
 |                                                                                                                                    
+----------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)     

Note: 改为t2 >t1 >t3的join order , 响应时间为0.01s 比 t3-> t1 -> t2响应时间0.08快了数倍, 可以看到cost 两表行数想成*0.1换算 t2 join t1 是195w, 而 t3 join t1 是19w, 所以用了t3 join t1. 但这仅是这一步的,并不是最终成本cost. 最终成本是 t2 -> t1 -> t3好于t3-> t1 -> t2。 猜测MySQL优化器的规则是基于局部成本最优,而不是全路径成本。

尝试禁用hash Join

mysql> show variables like '%optimizer_switch%' \G
*************************** 1. row ***************************
Variable_name: optimizer_switch
        Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on,use_invisible_indexes=off,skip_scan=on,hash_join=on,subquery_to_derived=off,prefer_ordering_index=on,hypergraph_optimizer=off,derived_condition_pushdown=on
1 row in set, 1 warning (0.01 sec)

mysql> SET optimizer_switch='hash_join=off';
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like '%optimizer_switch%' \G
*************************** 1. row ***************************
Variable_name: optimizer_switch
        Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on,use_invisible_indexes=off,skip_scan=on,hash_join=off,subquery_to_derived=off,prefer_ordering_index=on,hypergraph_optimizer=off,derived_condition_pushdown=on
1 row in set, 1 warning (0.00 sec)

mysql> explain format=tree select count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;        
+-------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                     |
+-------------------------------------------------------------------------------------------------------------+
| -> Aggregate: count(0)  (cost=3.73e+9 rows=1)                                                        
    -> Inner hash join (t2.`name` = t3.`name`), (t2.id = t1.id)  (cost=3.7e+9 rows=370e+6)             
        -> Table scan on t2  (cost=0.00437 rows=19453)                                                 
        -> Hash                                                                                        
            -> Inner hash join (no condition)  (cost=190103 rows=1.9e+6)                               
                -> Table scan on t3  (cost=0.192 rows=1900)                                            
                -> Hash                                                                                
                    -> Table scan on t1  (cost=101 rows=1000)                                          
 |                                                                                                     
+-------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)                                                                                
																					   


mysql> SET SESSION optimizer_switch = 'hash_join=off';
Query OK, 0 rows affected (0.00 sec)


mysql> SELECT @@optimizer_switch LIKE '%hash_join=on%' AS hash_join_enabled;
+-------------------+
| hash_join_enabled |
+-------------------+
|                 0 |
+-------------------+
1 row in set (0.00 sec)                                                                                                           
																															
mysql> explain format=tree select count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;                                   
+--------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                              |
+--------------------------------------------------------------------------------------------------------------------------------------+
| -> Aggregate: count(0)  (cost=3.73e+9 rows=1)                                                                                   
    -> Inner hash join (t2.`name` = t3.`name`), (t2.id = t1.id)  (cost=3.7e+9 rows=370e+6)                                        
        -> Table scan on t2  (cost=0.00437 rows=19453)                                                                            
        -> Hash                                                                                                                   
            -> Inner hash join (no condition)  (cost=190103 rows=1.9e+6)                                                          
                -> Table scan on t3  (cost=0.192 rows=1900)                                                                       
                -> Hash                                                                                                           
                    -> Table scan on t1  (cost=101 rows=1000)                                                                     
 |                                                                                                                                
+--------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)                                                                                                           
																															
																															
mysql> explain format=tree select /*+no_hash_join(t1,t2)*/ count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;          
+--------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                              |
+--------------------------------------------------------------------------------------------------------------------------------------+
| -> Aggregate: count(0)  (cost=3.73e+9 rows=1)                                                                                   
    -> Inner hash join (t2.`name` = t3.`name`), (t2.id = t1.id)  (cost=3.7e+9 rows=370e+6)                                        
        -> Table scan on t2  (cost=0.00437 rows=19453)                                                                            
        -> Hash                                                                                                                   
            -> Inner hash join (no condition)  (cost=190103 rows=1.9e+6)                                                          
                -> Table scan on t3  (cost=0.192 rows=1900)                                                                       
                -> Hash                                                                                                           
                    -> Table scan on t1  (cost=101 rows=1000)                                                                     
 |                                                                                                                                
+--------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)                                                                                                           
																															
mysql> select version();                                                                                                          
+-----------+                                                                                                                     
| version() |                                                                                                                     
+-----------+                                                                                                                     
| 8.0.36    |                                                                                                                     
+-----------+                                                                                                                     
1 row in set (0.00 sec)          

Note: 注意按之前的权限禁用hash join发现并不生效了。当前版本8.0.36.

block_nested_loop替代了hash_join

https://dev.mysql.com/doc/refman/8.4/en/hash-joins.html

By default, MySQL 8.0.18 and later employs hash joins whenever possible. It is possible to control whether hash joins are employed using one of the BNL and NO_BNL optimizer hints.

(MySQL 8.0.18 supported hash_join=on or hash_join=off as part of the setting for the optimizer_switch server system variable as well as the optimizer hints HASH_JOIN or NO_HASH_JOIN. In MySQL 8.0.19 and later, these no longer have any effect.)

Beginning with MySQL 8.0.18, MySQL employs a hash join for any query for which each join has an equi-join condition, and in which there are no indexes that can be applied to any join conditions.

A hash join is usually faster than and is intended to be used in such cases instead of the block nested loop algorithm (see Block Nested-Loop Join Algorithm) employed in previous versions of MySQL. Beginning with MySQL 8.0.20, support for block nested loop is removed, and the server employs a hash join wherever a block nested loop would have been used previously.

https://dev.mysql.com/doc/refman/8.0/en/switchable-optimizations.html

Hash Join Flags

hash_join (default on)

Controls hash joins in MySQL 8.0.18 only, and has no effect in any subsequent version. In MySQL 8.0.19 and later, to control hash join usage, use the block_nested_loop flag, instead.

mysql> SET SESSION optimizer_switch = 'hash_join=off,block_nested_loop=off';
Query OK, 0 rows affected (0.00 sec)



mysql> explain format=tree select count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;         
+-------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                           |
+-------------------------------------------------------------------------------------------------------------------+
| -> Aggregate: count(0)  (cost=3.78e+9 rows=1)                                                         
    -> Nested loop inner join  (cost=3.74e+9 rows=370e+6)                                               
        -> Nested loop inner join  (cost=191851 rows=1.9e+6)                                            
            -> Table scan on t1  (cost=101 rows=1000)                                                   
            -> Table scan on t3  (cost=1.94 rows=1900)                                                  
        -> Filter: ((t2.`name` = t3.`name`) and (t2.id = t1.id))  (cost=24.3 rows=195)                  
            -> Table scan on t2  (cost=24.3 rows=19453)                                                 
 |                                                                                                      
+-------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)                                                                                 
																										

mysql> explain  select /*+no_bnl(t1,t3)*/ count(*) from t1,t2,t3 where t1.id=t2.id and t2.name=t3.name;
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra                                      |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+--------------------------------------------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  1000 |   100.00 | NULL                                       |
|  1 | SIMPLE      | t3    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  1900 |   100.00 | NULL                                       |
|  1 | SIMPLE      | t2    | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 19453 |     1.00 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+--------------------------------------------+
3 rows in set, 1 warning (0.00 sec)

Note: 可见在当前的版本需要使用block_nested_loop=off参数,或no_bnl 禁用hash join (总感觉这名字起反了)。注意这里nest loop join 同样是错误的关连顺序 t1 -》 t3-》 t2。

小结:

MySQL因为错误的局部代价估算,产生了错误的join order ,导致 执行计划变慢,在hash join和nl join都有该问题。

Leave a Comment