在开发场景中如查询分页计算多少页或统计查询,有些开发是习惯基于明细的查询SQL,而外层直接加1层汇聚查询,如select count(*) from (select ….), 但子查询中可能有一些函数或主查询根本不需要的列, 在oracle中的查询转换中如select-project-join或select list pruning, 或VIEW merge SPJ,CVM 都是为了不影响SQL结果一致性,而优化低效的SQL. 但是从oracle迁移到其它数据库中,因为CBO的差异,导致SQL性能大量衰减,需要手动改写SQL, 最近从oracle迁移到pg系国产库,发现一个view中使用了function, 而外部查询根本不关心该列值,就是一个无效的查询列,无意义的函数调用, 在后期国产库迁移需要注意。
Oracle
SQL>CREATE OR REPLACE FUNCTION TEST_SLEEP( TIME_ IN NUMBER)
RETURN INTEGER IS
BEGIN
DBMS_LOCK.sleep(seconds => TIME_);
RETURN 1;
EXCEPTION
WHEN OTHERS THEN
RAISE;
RETURN 1;
END TEST_SLEEP;
/
SQL> set timing on
SQL> select min(object_id),TEST_SLEEP(10) from (select object_id,owner from anbob.tobj);
MIN(OBJECT_ID) TEST_SLEEP(10)
-------------- --------------
2 1
Elapsed: 00:00:10.08
SQL> select min(object_id) from (select object_id,TEST_SLEEP(100),owner from anbob.tobj);
MIN(OBJECT_ID)
--------------
2
Elapsed: 00:00:00.01
10053 trace
SPJ - select-project-join
SLP - select list pruning
CVM: Merging SPJ view SEL$2 (#0) into SEL$1 (#0)
Registered qb: SEL$1 0x7fd7bc45d0a8 (PARSER)
---------------------
QUERY BLOCK SIGNATURE
---------------------
signature (): qb_name=SEL$1 nbfros=1 flg=0
fro(0): flg=5 objn=0 hint_alias="from$_subquery$_001"@"SEL$1"
Registered qb: SEL$2 0x7fd7bc45b7b8 (PARSER)
---------------------
QUERY BLOCK SIGNATURE
---------------------
signature (): qb_name=SEL$2 nbfros=1 flg=0
fro(0): flg=4 objn=181330 hint_alias="TOBJ"@"SEL$2"
****************
QUERY BLOCK TEXT
****************
select min(object_id) from (select object_id,lower(object_name),owner from anbob.tobj)
GPUA: kkqgpuaCheckValidity :******* UNPARSED QUERY IS *******
SELECT MIN("from$_subquery$_001"."OBJECT_ID") "MIN(OBJECT_ID)" FROM (SELECT "TOBJ"."OBJECT_ID" "OBJECT_ID",LOWER("TOBJ"."OBJECT_NAME") "LOWER(OBJECT_NAME)","TOBJ"."OWNER" "OWNER" FROM "ANBOB"."TOBJ" "TOBJ") "from$_subquery$_001"
--
**************************
Query transformations (QT)
**************************
===================================
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT MIN("TOBJ"."OBJECT_ID") "MIN(OBJECT_ID)" FROM "ANBOB"."TOBJ" "TOBJ"
kkoqbc: optimizing query block SEL$F5BB74E1 (#0)
Note:
oracle做了查询转换view merge和select list pruning ,子查询中的function被column投影裁剪掉了。
Kingbase
[kingbase@kingbase1 ~]$ ksql -U xxxx.... Password: ksql (V8.0) Type "help" for help. kingbase=# create table t(id int,name varchar2(20)); CREATE TABLE kingbase=# insert into t select rownum,'anbob'||rownum from dual connect by rownum<=10; INSERT 0 10 kingbase=# \timing on Timing is on. kingbase=# select count(*) from (select id,name,pg_sleep(2) from t); count ------- 10 (1 row) Time: 20015.006 ms (00:20.015)
Note:
postgresql中是每行调用了函数。
GaussDB
anbob=# select version();
version
-------------------------------------------------------------------------------------------------------------------
gaussdb (GaussDB Kernel 505.2.0 build 82d715e8) compiled at 2024-09-20 00:15:22 commit 9967 last mr 19883 release
(1 row)
Time: 0.568 ms
anbob=# explain analyze select id from (select id,name,pg_sleep(5) from t limit 3);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Subquery Scan on __unnamed_subquery__ (cost=0.00..0.37 rows=3 width=4) (actual time=5001.069..15003.188 rows=3 loops=1)
-> Limit (cost=0.00..0.34 rows=3 width=680) (actual time=5001.065..15003.174 rows=3 loops=1)
-> Seq Scan on t (cost=0.00..344.50 rows=3 width=680) (actual time=5001.062..15003.164 rows=3 loops=1)
Total runtime: 15003.372 ms
(4 rows)
Time: 15004.628 ms
anbob=# show debug_print_rewritten;
debug_print_rewritten
-----------------------
off
(1 row)
Time: 0.473 ms
anbob=# set debug_print_rewritten=on;
ERROR: parameter "debug_print_rewritten" cannot be changed now
Time: 0.953 ms
Note:
GaussDB同kingbase,意料之内同根PostGreSQL。 debug_print_rewritten参数不能配置,也没有10053, 像查看一下rewrite规则。
PostgreSQL
debug_print_parse: When you enable this parameter, the parse tree of incoming queries is printed to the server log. This can be useful for understanding the structure of a query and the values of any bound parameters.
debug_print_rewritten: When you enable this parameter, the rewritten forms of incoming queries are printed to the server log. This can be useful for understanding how the query planner interprets a query and the values of any bound parameters.
[postgres@anbob1 data]$ psql -c"show all"|grep parse
debug_pretty_print | on | Indents parse and plan tree displays.
debug_print_parse | off | Logs each query's parse tree.
debug_print_rewritten | off | Logs each query's rewritten parse tree.
log_parser_stats | off | Writes parser performance statistics to the server log.
postgres=# set debug_print_parse=on;
SET
postgres=# select name from (select name,num,pg_sleep(2) from tg limit 2);
2024-11-23 19:39:21.866 CST [18505] LOG: parse tree:
2024-11-23 19:39:21.866 CST [18505] DETAIL: {QUERY
:commandType 1
:querySource 0
:canSetTag true
:utilityStmt <>
:resultRelation 0
:hasAggs false
:hasWindowFuncs false
:hasTargetSRFs false
:hasSubLinks false
:hasDistinctOn false
:hasRecursive false
:hasModifyingCTE false
:hasForUpdate false
:hasRowSecurity false
:isReturn false
:cteList <>
:rtable (
{RANGETBLENTRY
:alias <>
:eref
{ALIAS
:aliasname unnamed_subquery
:colnames ("name" "num" "pg_sleep")
}
:rtekind 1
:subquery
{QUERY
:commandType 1
:querySource 0
:canSetTag true
:utilityStmt <>
:resultRelation 0
:hasAggs false
:hasWindowFuncs false
:hasTargetSRFs false
:hasSubLinks false
:hasDistinctOn false
:hasRecursive false
:hasModifyingCTE false
:hasForUpdate false
:hasRowSecurity false
:isReturn false
:cteList <>
:rtable (
{RANGETBLENTRY
:alias <>
:eref
{ALIAS
:aliasname tg
:colnames ("name" "num")
}
:rtekind 0
:relid 17101
:inh true
:relkind r
:rellockmode 1
:perminfoindex 1
:tablesample <>
:lateral false
:inFromCl true
:securityQuals <>
}
)
...
}
2024-11-23 19:39:21.866 CST [18505] STATEMENT: select name from (select name,num,pg_sleep(2) from tg limit 2);
达梦V8
SQL> select * from v$version; LINEID BANNER ---------- --------------------------------- 1 DM Database Server 64 V8 2 DB Version: 0x7000c ... SQL> create table t(id int,name varchar2(20)); executed successfully used time: 111.517(ms). Execute id is 14641109. SQL> insert into t select rownum,'anbob'||rownum from dual connect by rownum<=10; affect rows 10 used time: 83.469(ms). Execute id is 14641110. SQL> select count(*) from (select id,name,TEST_SLEEP(2) from t); LINEID COUNT(*) ---------- -------------------- 1 10 used time: 17.317(ms). Execute id is 14641111.
Note:
达梦形为同oracle, 子查询中的sleep函数并未调用。
达梦10053
SQL> ALTER SESSION 0 SET EVENTS '10053 trace name context forever';
executed successfully
used time: 69.052(ms). Execute id is 14641112.
SQL> select count(*) from (select id,name,TEST_SLEEP(10) from t);
LINEID COUNT(*)
---------- --------------------
1 10
used time: 20.122(ms). Execute id is 14641113.
SQL> ALTER SESSION 0 SET EVENTS '10053 trace name context off';
executed successfully
used time: 1.378(ms). Execute id is 14641114.
SQL> select para_name,para_value from v$dm_ini where para_name like 'TRACE_PATH';
LINEID PARA_NAME PARA_VALUE
---------- ---------- ----------------------------------
1 TRACE_PATH /home/dmdba/dmdsc/data/DSC01/trace
used time: 75.690(ms). Execute id is 14641115.
SQL> explain select count(*) from (select id,name,TEST_SLEEP(10) from t);
1 #NSET2: [1, 1, 52]
2 #PRJT2: [1, 1, 52]; exp_num(1), is_atom(FALSE)
3 #FAGR2: [1, 1, 52]; sfun_num(1)
used time: 1.322(ms). Execute id is 0.
SQL> host
[dmdba@ora19c1 ~]$ cd /home/dmdba/dmdsc/data/DSC01/trace
[dmdba@ora19c1 trace]$ ls
DSC01_1124_0856_139920505114648.trc
[dmdba@ora19c1 trace]$ vi DSC01_1124_0856_139920505114648.trc
[dmdba@ora19c1 trace]$ exit
exit
*** Plan before optimized:
project[0x7f41c80408e8] n_exp(1)
group[0x7f41c8040280]
base table[0x7f41c803fc48] (T, FULL SEARCH)
---------------- single table access path probe for T ----------------
*** path 1: INDEX33708453 (FULL search), cost: 0.04566
>>> best access path: INDEX33708453 (FULL search), cost: 0.04566
*** BEST PLAN FOR THIS STATEMENT ***
project[0x7f41c80522c0] n_exp(1) (cost: 0.04566, rows: 1)
group[0x7f41c8052928] (cost: 0.04566, rows: 1)
base table[0x7f41c8052f90] (T, INDEX33708453, FULL SEARCH) (cost: 0.04566, rows: 10)
-------------------------- END --------------------------
Note:
达梦虽然支持10053 trace,但trace中的内容似乎没有太多,相比Oracle还是差的多。
Oceanbase
obclient [SYS]> select * from v$version;
+---------------------------------------------------------------------------------------------------------------+--------+
| BANNER | CON_ID |
+---------------------------------------------------------------------------------------------------------------+--------+
| OceanBase 4.2.1.2 (r102010012023120119-130bf91ba413a00bb696fe8853906fde1f29f83d) (Built Dec 1 2023 20:00:25) | 0 |
+---------------------------------------------------------------------------------------------------------------+--------+
1 row in set (0.154 sec)
obclient [SYS]> CREATE OR REPLACE FUNCTION TEST_SLEEP( TIME_ IN NUMBER)
-> RETURN INTEGER IS
-> BEGIN
-> DBMS_LOCK.sleep(seconds => TIME_);
-> RETURN 1;
-> EXCEPTION
-> WHEN OTHERS THEN
-> RAISE;
-> RETURN 1;
-> END TEST_SLEEP;
-> /
Query OK, 0 rows affected (3.212 sec)
obclient [SYS]> create table t as select rownum id,'anbob'||rownum name from dual connect by rownum<=10;
Query OK, 10 rows affected (6.127 sec)
obclient [SYS]> select count(*) from (select id,name,TEST_SLEEP(5) from t);
+----------+
| COUNT(*) |
+----------+
| 10 |
+----------+
1 row in set (0.001 sec)
obclient [SYS]> select count(*),TEST_SLEEP(5) from (select id,name from t);
+----------+---------------+
| COUNT(*) | TEST_SLEEP(5) |
+----------+---------------+
| 10 | 1 |
+----------+---------------+
1 row in set (5.131 sec)
obclient [SYS]> explain select count(*) from (select id,name,TEST_SLEEP(5) from t);
+-------------------------------------------------------------------------+
| Query Plan |
+-------------------------------------------------------------------------+
| ============================================================= |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ------------------------------------------------------------- |
| |0 |SCALAR GROUP BY | |1 |5 | |
| |1 |└─DISTRIBUTED TABLE FULL SCAN|T |10 |4 | |
| ============================================================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([T_FUN_COUNT_SUM(T_FUN_COUNT(*))]), filter(nil), rowset=16 |
| group(nil), agg_func([T_FUN_COUNT_SUM(T_FUN_COUNT(*))]) |
| 1 - output([T_FUN_COUNT(*)]), filter(nil), rowset=16 |
| access(nil), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([T.__pk_increment]), range(MIN ; MAX)always true |
+-------------------------------------------------------------------------+
14 rows in set (0.116 sec)
Note:
Oceanbase的形为和Oracle一样,也消除了子查询中的函数调用。
修正PG系的裁剪行为
看上去基于posetgresql的GaussDB, Kingbase,Highgo,OpenGauss系都存在该问题,实际上也是和function函数的Volatility属性有关,之前写过一篇《Postgresql 、openGauss系function函数Volatile 属性索引无法使用》,函数的volatile稳定性影响,修改PG系的函数Volatility属性,再次测试.
-- Kingbase
kingbase=# \df+ pg_sleep
List of functions
Schema | Name | Result data type | Argument data types | Type | Volatility | Parallel | Owner | Security | Access privileges | Language | Source code | Description
------------+----------+------------------+---------------------+------+------------+----------+--------+----------+-------------------+----------+-------------+-----------------------------------------
pg_catalog | pg_sleep | void | double precision | func | volatile | safe | system | invoker | | internal | pg_sleep | sleep for the specified time in seconds
(1 row)
kingbase=# alter function pg_sleep immutable;
ALTER FUNCTION
kingbase=# explain (analyze,verbose) select count(*) from (select id,name,pg_sleep(2) from t);
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Aggregate (cost=21.25..21.26 rows=1 width=8) (actual time=0.031..0.032 rows=1 loops=1)
Output: count(*)
-> Seq Scan on public.t (cost=0.00..19.00 rows=900 width=0) (actual time=0.022..0.023 rows=10 loops=1)
Output: t.id, t.name
Planning Time: 0.175 ms
Execution Time: 0.072 ms
(6 rows)
-- GaussDB
anbob=# select version();
version
-------------------------------------------------------------------------------------------------------------------
gaussdb (GaussDB Kernel 505.2.0 build 82d715e8) compiled at 2024-09-20 00:15:22 commit 9967 last mr 19883 release
(1 row)
anbob=# alter function pg_sleep immutable;
ERROR: syntax error at or near "immutable"
anbob=# alter function pg_sleep(double precision) immutable;
ERROR: function "pg_sleep" is a builtin function,it can not be altered
anbob=# CREATE OR REPLACE FUNCTION pg_sleep1(double precision)
anbob-# RETURNS void
anbob-# LANGUAGE internal
anbob-# AS $function$pg_sleep$function$;
CREATE FUNCTION
anbob=# \df+ pg_sleep*
List of functions
Schema | Name | Result data type | Argument data types | Type | Volatility | Owner | Language | Source code | Description | fencedmode | propackage | prokind
------------+-----------+------------------+---------------------+--------+------------+-------+----------+-------------+-----------------------------------------+------------+------------+---------
pg_catalog | pg_sleep | void | double precision | normal | volatile | gauss | internal | pg_sleep | sleep for the specified time in seconds | f | f | f
public | pg_sleep1 | void | double precision | normal | volatile | gauss | internal | pg_sleep | | f | f | f
(2 rows)
anbob=# explain (verbose,analyze) select id from (select id,name,pg_sleep1(5) from t limit 3);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Subquery Scan on __unnamed_subquery__ (cost=0.00..0.37 rows=3 width=4) (actual time=5000.555..15001.879 rows=3 loops=1)
Output: __unnamed_subquery__.id
-> Limit (cost=0.00..0.34 rows=3 width=680) (actual time=5000.552..15001.866 rows=3 loops=1)
Output: t.id, t.name, (pg_sleep1(5::double precision))
-> Seq Scan on public.t (cost=0.00..344.50 rows=3 width=680) (actual time=5000.548..15001.855 rows=3 loops=1)
Output: t.id, t.name, pg_sleep1(5::double precision)
Total runtime: 15002.061 ms
(7 rows)
anbob=# alter function pg_sleep1(double precision) immutable;
ALTER FUNCTION
anbob=# explain (verbose,analyze) select id from (select id,name,pg_sleep1(5) from t limit 3);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Subquery Scan on __unnamed_subquery__ (cost=0.00..0.37 rows=3 width=4) (actual time=0.037..0.039 rows=3 loops=1)
Output: __unnamed_subquery__.id
-> Limit (cost=0.00..0.34 rows=3 width=680) (actual time=0.036..0.038 rows=3 loops=1)
Output: t.id, t.name, (''::void)
-> Seq Scan on public.t (cost=0.00..337.00 rows=3 width=680) (actual time=0.034..0.035 rows=3 loops=1)
Output: t.id, t.name, ''::void
Total runtime: 0.152 ms
(7 rows)
-- PostgreSQL
postgres=# \df+ pg_sleep
List of functions
Schema | Name | Result data type | Argument data types | Type | Volatility | Parallel | Owner | Security | Access privileges | Language | Internal name | Description
------------+----------+------------------+---------------------+------+------------+----------+----------+----------+-------------------+----------+---------------+-----------------------------------------
pg_catalog | pg_sleep | void | double precision | func | volatile | safe | postgres | invoker | | internal | pg_sleep | sleep for the specified time in seconds
(1 row)
postgres=# alter function pg_sleep immutable;
ALTER FUNCTION
postgres=# :eabv select name from (select name,num,pg_sleep(2) from tg limit 2);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Subquery Scan on unnamed_subquery (cost=0.00..0.05 rows=2 width=14) (actual time=0.012..0.014 rows=2 loops=1)
Output: unnamed_subquery.name
Buffers: shared hit=1
-> Limit (cost=0.00..0.03 rows=2 width=22) (actual time=0.011..0.012 rows=2 loops=1)
Output: tg.name, NULL::integer, NULL::void
Buffers: shared hit=1
-> Seq Scan on public.tg (cost=0.00..27.70 rows=1770 width=22) (actual time=0.010..0.010 rows=2 loops=1)
Output: tg.name, NULL::integer, NULL::void
Buffers: shared hit=1
Planning Time: 0.107 ms
Execution Time: 0.042 ms
(11 rows)
— OVER —
还有更多异构数据迁移问题隐藏在项目中,如果您有国产化改造咨询需求,可以考虑联系我们(www.anbob.com首页的联系方式)