首页 » OceanBase, ORACLE 9i-23ai, PostgreSQL/GaussDB, 达梦 » Oracle、Oceanbase、Kingbase、GaussDB、达梦数据库比较系列(二十七):子查询中的函数投影裁剪

Oracle、Oceanbase、Kingbase、GaussDB、达梦数据库比较系列(二十七):子查询中的函数投影裁剪

在开发场景中如查询分页计算多少页或统计查询,有些开发是习惯基于明细的查询SQL,而外层直接加1层汇聚查询,如select count(*) from (select ….), 但子查询中可能有一些函数或主查询根本不需要的列, 在oracle中的查询转换中如select-project-join或select list pruning, 或VIEW merge SPJ,CVM 都是为了不影响SQL结果一致性,而优化低效的SQL. 但是从oracle迁移到其它数据库中,因为CBO的差异,导致SQL性能大量衰减,需要手动改写SQL, 最近从oracle迁移到pg系国产库,发现一个view中使用了function, 而外部查询根本不关心该列值,就是一个无效的查询列,无意义的函数调用, 在后期国产库迁移需要注意。

Oracle

SQL>CREATE OR REPLACE FUNCTION TEST_SLEEP( TIME_  IN  NUMBER)
   RETURN INTEGER IS
    BEGIN
   DBMS_LOCK.sleep(seconds => TIME_);
   RETURN 1;
    EXCEPTION
      WHEN OTHERS THEN
      RAISE;
      RETURN 1;
   END TEST_SLEEP;
   /

SQL> set timing on

SQL> select min(object_id),TEST_SLEEP(10) from (select object_id,owner from anbob.tobj);

MIN(OBJECT_ID) TEST_SLEEP(10)
-------------- --------------
             2              1

Elapsed: 00:00:10.08
SQL> select min(object_id) from (select object_id,TEST_SLEEP(100),owner from anbob.tobj);

MIN(OBJECT_ID)
--------------
             2
Elapsed: 00:00:00.01

10053 trace

SPJ - select-project-join
SLP - select list pruning
CVM:   Merging SPJ view SEL$2 (#0) into SEL$1 (#0)

Registered qb: SEL$1 0x7fd7bc45d0a8 (PARSER)
---------------------
QUERY BLOCK SIGNATURE
---------------------
  signature (): qb_name=SEL$1 nbfros=1 flg=0
    fro(0): flg=5 objn=0 hint_alias="from$_subquery$_001"@"SEL$1"

Registered qb: SEL$2 0x7fd7bc45b7b8 (PARSER)
---------------------
QUERY BLOCK SIGNATURE
---------------------
  signature (): qb_name=SEL$2 nbfros=1 flg=0
    fro(0): flg=4 objn=181330 hint_alias="TOBJ"@"SEL$2"

****************
QUERY BLOCK TEXT
****************
select min(object_id) from (select object_id,lower(object_name),owner from anbob.tobj)

GPUA: kkqgpuaCheckValidity :******* UNPARSED QUERY IS *******
SELECT MIN("from$_subquery$_001"."OBJECT_ID") "MIN(OBJECT_ID)" FROM  (SELECT "TOBJ"."OBJECT_ID" "OBJECT_ID",LOWER("TOBJ"."OBJECT_NAME") "LOWER(OBJECT_NAME)","TOBJ"."OWNER" "OWNER" FROM "ANBOB"."TOBJ" "TOBJ") "from$_subquery$_001"
--

**************************
Query transformations (QT)
**************************

===================================
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT MIN("TOBJ"."OBJECT_ID") "MIN(OBJECT_ID)" FROM "ANBOB"."TOBJ" "TOBJ"
kkoqbc: optimizing query block SEL$F5BB74E1 (#0)

Note:
oracle做了查询转换view merge和select list pruning ,子查询中的function被column投影裁剪掉了。

Kingbase

[kingbase@kingbase1 ~]$ ksql -U xxxx....
Password:
ksql (V8.0)
Type "help" for help.

kingbase=# create table t(id int,name varchar2(20));
CREATE TABLE
kingbase=# insert into t select rownum,'anbob'||rownum from dual connect by rownum<=10;
INSERT 0 10
kingbase=# \timing on
Timing is on.
kingbase=# select count(*) from (select id,name,pg_sleep(2) from t);
 count
-------
    10
(1 row)

Time: 20015.006 ms (00:20.015)

Note:
postgresql中是每行调用了函数。

GaussDB

anbob=# select version();
                                                      version
-------------------------------------------------------------------------------------------------------------------
 gaussdb (GaussDB Kernel 505.2.0 build 82d715e8) compiled at 2024-09-20 00:15:22 commit 9967 last mr 19883 release
(1 row)

Time: 0.568 ms
anbob=# explain analyze select id from (select id,name,pg_sleep(5)  from t limit 3);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on __unnamed_subquery__  (cost=0.00..0.37 rows=3 width=4) (actual time=5001.069..15003.188 rows=3 loops=1)
   ->  Limit  (cost=0.00..0.34 rows=3 width=680) (actual time=5001.065..15003.174 rows=3 loops=1)
         ->  Seq Scan on t  (cost=0.00..344.50 rows=3 width=680) (actual time=5001.062..15003.164 rows=3 loops=1)
 Total runtime: 15003.372 ms
(4 rows)

Time: 15004.628 ms
anbob=# show debug_print_rewritten;
 debug_print_rewritten
-----------------------
 off
(1 row)

Time: 0.473 ms
anbob=# set debug_print_rewritten=on;
ERROR:  parameter "debug_print_rewritten" cannot be changed now
Time: 0.953 ms

Note:
GaussDB同kingbase,意料之内同根PostGreSQL。 debug_print_rewritten参数不能配置,也没有10053, 像查看一下rewrite规则。

PostgreSQL

debug_print_parse: When you enable this parameter, the parse tree of incoming queries is printed to the server log. This can be useful for understanding the structure of a query and the values of any bound parameters.

debug_print_rewritten: When you enable this parameter, the rewritten forms of incoming queries are printed to the server log. This can be useful for understanding how the query planner interprets a query and the values of any bound parameters.

[postgres@anbob1 data]$ psql -c"show all"|grep parse
 debug_pretty_print                          | on                              | Indents parse and plan tree displays.
 debug_print_parse                           | off                             | Logs each query's parse tree.
 debug_print_rewritten                       | off                             | Logs each query's rewritten parse tree.
 log_parser_stats                            | off                             | Writes parser performance statistics to the server log.

postgres=# set debug_print_parse=on;
SET
postgres=# select name from (select name,num,pg_sleep(2) from tg limit 2);
2024-11-23 19:39:21.866 CST [18505] LOG:  parse tree:
2024-11-23 19:39:21.866 CST [18505] DETAIL:     {QUERY
           :commandType 1
           :querySource 0
           :canSetTag true
           :utilityStmt <>
           :resultRelation 0
           :hasAggs false
           :hasWindowFuncs false
           :hasTargetSRFs false
           :hasSubLinks false
           :hasDistinctOn false
           :hasRecursive false
           :hasModifyingCTE false
           :hasForUpdate false
           :hasRowSecurity false
           :isReturn false
           :cteList <>
           :rtable (
              {RANGETBLENTRY
              :alias <>
              :eref
                 {ALIAS
                 :aliasname unnamed_subquery
                 :colnames ("name" "num" "pg_sleep")
                 }
              :rtekind 1
              :subquery
                 {QUERY
                 :commandType 1
                 :querySource 0
                 :canSetTag true
                 :utilityStmt <>
                 :resultRelation 0
                 :hasAggs false
                 :hasWindowFuncs false
                 :hasTargetSRFs false
                 :hasSubLinks false
                 :hasDistinctOn false
                 :hasRecursive false
                 :hasModifyingCTE false
                 :hasForUpdate false
                 :hasRowSecurity false
                 :isReturn false
                 :cteList <>
                 :rtable (
                    {RANGETBLENTRY
                    :alias <>
                    :eref
                       {ALIAS
                       :aliasname tg
                       :colnames ("name" "num")
                       }
                    :rtekind 0
                    :relid 17101
                    :inh true
                    :relkind r
                    :rellockmode 1
                    :perminfoindex 1
                    :tablesample <>
                    :lateral false
                    :inFromCl true
                    :securityQuals <>
                    }
                 )
     ...
           }

2024-11-23 19:39:21.866 CST [18505] STATEMENT:  select name from (select name,num,pg_sleep(2) from tg limit 2);

达梦V8

SQL> select * from v$version;

LINEID     BANNER
---------- ---------------------------------
1          DM Database Server 64 V8
2          DB Version: 0x7000c
...

SQL> create table t(id int,name varchar2(20));
executed successfully
used time: 111.517(ms). Execute id is 14641109.

SQL> insert into t select rownum,'anbob'||rownum from dual connect by rownum<=10; 
affect rows 10 used time: 83.469(ms). Execute id is 14641110. 

SQL> select count(*) from (select id,name,TEST_SLEEP(2) from t);
LINEID     COUNT(*)
---------- --------------------
1          10

used time: 17.317(ms). Execute id is 14641111.

Note:
达梦形为同oracle, 子查询中的sleep函数并未调用。

达梦10053

SQL> ALTER SESSION 0 SET EVENTS '10053 trace name context forever';
executed successfully
used time: 69.052(ms). Execute id is 14641112.
SQL> select count(*) from (select id,name,TEST_SLEEP(10) from t);
LINEID     COUNT(*)
---------- --------------------
1          10

used time: 20.122(ms). Execute id is 14641113.
SQL> ALTER SESSION 0 SET EVENTS '10053 trace name context off';
executed successfully
used time: 1.378(ms). Execute id is 14641114.
SQL> select para_name,para_value from v$dm_ini where para_name like 'TRACE_PATH';

LINEID     PARA_NAME  PARA_VALUE
---------- ---------- ----------------------------------
1          TRACE_PATH /home/dmdba/dmdsc/data/DSC01/trace

used time: 75.690(ms). Execute id is 14641115.

SQL> explain select count(*) from (select id,name,TEST_SLEEP(10) from t);

1   #NSET2: [1, 1, 52]
2     #PRJT2: [1, 1, 52]; exp_num(1), is_atom(FALSE)
3       #FAGR2: [1, 1, 52]; sfun_num(1)

used time: 1.322(ms). Execute id is 0.

SQL> host

[dmdba@ora19c1 ~]$ cd /home/dmdba/dmdsc/data/DSC01/trace
[dmdba@ora19c1 trace]$ ls
DSC01_1124_0856_139920505114648.trc
[dmdba@ora19c1 trace]$ vi DSC01_1124_0856_139920505114648.trc
[dmdba@ora19c1 trace]$ exit
exit

*** Plan before optimized:
 project[0x7f41c80408e8] n_exp(1)
   group[0x7f41c8040280]
     base table[0x7f41c803fc48] (T, FULL SEARCH)

---------------- single table access path probe for T ----------------
*** path 1: INDEX33708453 (FULL search), cost: 0.04566

>>> best access path: INDEX33708453 (FULL search), cost: 0.04566

*** BEST PLAN FOR THIS STATEMENT ***
 project[0x7f41c80522c0] n_exp(1) (cost: 0.04566, rows: 1)
   group[0x7f41c8052928]  (cost: 0.04566, rows: 1)
     base table[0x7f41c8052f90] (T, INDEX33708453, FULL SEARCH) (cost: 0.04566, rows: 10)

-------------------------- END --------------------------

Note:
达梦虽然支持10053 trace,但trace中的内容似乎没有太多,相比Oracle还是差的多。

Oceanbase

obclient [SYS]> select * from v$version;
+---------------------------------------------------------------------------------------------------------------+--------+
| BANNER                                                                                                        | CON_ID |
+---------------------------------------------------------------------------------------------------------------+--------+
| OceanBase 4.2.1.2 (r102010012023120119-130bf91ba413a00bb696fe8853906fde1f29f83d) (Built Dec  1 2023 20:00:25) |      0 |
+---------------------------------------------------------------------------------------------------------------+--------+
1 row in set (0.154 sec)

obclient [SYS]> CREATE OR REPLACE FUNCTION TEST_SLEEP( TIME_  IN  NUMBER)
    ->    RETURN INTEGER IS
    ->     BEGIN
    ->    DBMS_LOCK.sleep(seconds => TIME_);
    ->    RETURN 1;
    ->     EXCEPTION
    ->       WHEN OTHERS THEN
    ->       RAISE;
    ->       RETURN 1;
    ->    END TEST_SLEEP;
    ->    /
Query OK, 0 rows affected (3.212 sec)

obclient [SYS]> create table t as select rownum id,'anbob'||rownum name from dual connect by rownum<=10; 
Query OK, 10 rows affected (6.127 sec) 

obclient [SYS]> select count(*) from (select id,name,TEST_SLEEP(5) from t);
+----------+
| COUNT(*) |
+----------+
|       10 |
+----------+
1 row in set (0.001 sec)

obclient [SYS]> select count(*),TEST_SLEEP(5) from (select id,name  from t);
+----------+---------------+
| COUNT(*) | TEST_SLEEP(5) |
+----------+---------------+
|       10 |             1 |
+----------+---------------+
1 row in set (5.131 sec)

obclient [SYS]> explain select count(*) from (select id,name,TEST_SLEEP(5) from t);
+-------------------------------------------------------------------------+
| Query Plan                                                              |
+-------------------------------------------------------------------------+
| =============================================================           |
| |ID|OPERATOR                     |NAME|EST.ROWS|EST.TIME(us)|           |
| -------------------------------------------------------------           |
| |0 |SCALAR GROUP BY              |    |1       |5           |           |
| |1 |└─DISTRIBUTED TABLE FULL SCAN|T   |10      |4           |           |
| =============================================================           |
| Outputs & filters:                                                      |
| -------------------------------------                                   |
|   0 - output([T_FUN_COUNT_SUM(T_FUN_COUNT(*))]), filter(nil), rowset=16 |
|       group(nil), agg_func([T_FUN_COUNT_SUM(T_FUN_COUNT(*))])           |
|   1 - output([T_FUN_COUNT(*)]), filter(nil), rowset=16                  |
|       access(nil), partitions(p0)                                       |
|       is_index_back=false, is_global_index=false,                       |
|       range_key([T.__pk_increment]), range(MIN ; MAX)always true        |
+-------------------------------------------------------------------------+
14 rows in set (0.116 sec)

Note:
Oceanbase的形为和Oracle一样,也消除了子查询中的函数调用。

修正PG系的裁剪行为

看上去基于posetgresql的GaussDB, Kingbase,Highgo,OpenGauss系都存在该问题,实际上也是和function函数的Volatility属性有关,之前写过一篇《Postgresql 、openGauss系function函数Volatile 属性索引无法使用》,函数的volatile稳定性影响,修改PG系的函数Volatility属性,再次测试.

-- Kingbase
kingbase=# \df+ pg_sleep
                                                                                            List of functions
   Schema   |   Name   | Result data type | Argument data types | Type | Volatility | Parallel | Owner  | Security | Access privileges | Language | Source code |               Description
------------+----------+------------------+---------------------+------+------------+----------+--------+----------+-------------------+----------+-------------+-----------------------------------------
 pg_catalog | pg_sleep | void             | double precision    | func | volatile   | safe     | system | invoker  |                   | internal | pg_sleep    | sleep for the specified time in seconds
(1 row)

kingbase=# alter function pg_sleep immutable;
ALTER FUNCTION

kingbase=# explain (analyze,verbose) select count(*) from (select id,name,pg_sleep(2) from t);
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=21.25..21.26 rows=1 width=8) (actual time=0.031..0.032 rows=1 loops=1)
   Output: count(*)
   ->  Seq Scan on public.t  (cost=0.00..19.00 rows=900 width=0) (actual time=0.022..0.023 rows=10 loops=1)
         Output: t.id, t.name
 Planning Time: 0.175 ms
 Execution Time: 0.072 ms
(6 rows)

-- GaussDB
anbob=# select version();
                                                      version
-------------------------------------------------------------------------------------------------------------------
 gaussdb (GaussDB Kernel 505.2.0 build 82d715e8) compiled at 2024-09-20 00:15:22 commit 9967 last mr 19883 release
(1 row)

anbob=# alter function pg_sleep immutable;
ERROR:  syntax error at or near "immutable"

anbob=#  alter function pg_sleep(double precision) immutable;
ERROR:  function "pg_sleep" is a builtin function,it can not be altered

anbob=# CREATE OR REPLACE FUNCTION pg_sleep1(double precision)
anbob-#  RETURNS void
anbob-#  LANGUAGE internal
anbob-# AS $function$pg_sleep$function$;
CREATE FUNCTION

anbob=# \df+ pg_sleep*
                                                                                          List of functions
   Schema   |   Name    | Result data type | Argument data types |  Type  | Volatility | Owner | Language | Source code |               Description               | fencedmode | propackage | prokind
------------+-----------+------------------+---------------------+--------+------------+-------+----------+-------------+-----------------------------------------+------------+------------+---------
 pg_catalog | pg_sleep  | void             | double precision    | normal | volatile   | gauss | internal | pg_sleep    | sleep for the specified time in seconds | f          | f          | f
 public     | pg_sleep1 | void             | double precision    | normal | volatile   | gauss | internal | pg_sleep    |                                         | f          | f          | f
(2 rows)

anbob=# explain (verbose,analyze) select id from (select id,name,pg_sleep1(5)  from t limit 3);
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on __unnamed_subquery__  (cost=0.00..0.37 rows=3 width=4) (actual time=5000.555..15001.879 rows=3 loops=1)
   Output: __unnamed_subquery__.id
   ->  Limit  (cost=0.00..0.34 rows=3 width=680) (actual time=5000.552..15001.866 rows=3 loops=1)
         Output: t.id, t.name, (pg_sleep1(5::double precision))
         ->  Seq Scan on public.t  (cost=0.00..344.50 rows=3 width=680) (actual time=5000.548..15001.855 rows=3 loops=1)
               Output: t.id, t.name, pg_sleep1(5::double precision)
 Total runtime: 15002.061 ms
(7 rows)

anbob=# alter function pg_sleep1(double precision) immutable;
ALTER FUNCTION

anbob=# explain (verbose,analyze) select id from (select id,name,pg_sleep1(5)  from t limit 3);
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Subquery Scan on __unnamed_subquery__  (cost=0.00..0.37 rows=3 width=4) (actual time=0.037..0.039 rows=3 loops=1)
   Output: __unnamed_subquery__.id
   ->  Limit  (cost=0.00..0.34 rows=3 width=680) (actual time=0.036..0.038 rows=3 loops=1)
         Output: t.id, t.name, (''::void)
         ->  Seq Scan on public.t  (cost=0.00..337.00 rows=3 width=680) (actual time=0.034..0.035 rows=3 loops=1)
               Output: t.id, t.name, ''::void
 Total runtime: 0.152 ms
(7 rows)


-- PostgreSQL
postgres=# \df+ pg_sleep
                                                                                              List of functions
   Schema   |   Name   | Result data type | Argument data types | Type | Volatility | Parallel |  Owner   | Security | Access privileges | Language | Internal name |               Description
------------+----------+------------------+---------------------+------+------------+----------+----------+----------+-------------------+----------+---------------+-----------------------------------------
 pg_catalog | pg_sleep | void             | double precision    | func | volatile   | safe     | postgres | invoker  |                   | internal | pg_sleep      | sleep for the specified time in seconds
(1 row)

postgres=# alter function pg_sleep immutable;
ALTER FUNCTION
postgres=# :eabv select name from (select name,num,pg_sleep(2) from tg limit 2);
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Subquery Scan on unnamed_subquery  (cost=0.00..0.05 rows=2 width=14) (actual time=0.012..0.014 rows=2 loops=1)
   Output: unnamed_subquery.name
   Buffers: shared hit=1
   ->  Limit  (cost=0.00..0.03 rows=2 width=22) (actual time=0.011..0.012 rows=2 loops=1)
         Output: tg.name, NULL::integer, NULL::void
         Buffers: shared hit=1
         ->  Seq Scan on public.tg  (cost=0.00..27.70 rows=1770 width=22) (actual time=0.010..0.010 rows=2 loops=1)
               Output: tg.name, NULL::integer, NULL::void
               Buffers: shared hit=1
 Planning Time: 0.107 ms
 Execution Time: 0.042 ms
(11 rows)

— OVER —

 

还有更多异构数据迁移问题隐藏在项目中,如果您有国产化改造咨询需求,可以考虑联系我们(www.anbob.com首页的联系方式)

打赏

,

目前这篇文章还没有评论(Rss)

我要评论