性能诊断: PostgreSQL系中数字类型隐式转换导致无法使用索引

在oracle数据库同样也会出现因为隐式转换导致的索引无法使用,但是在PostgreSQL系的数据库中如kingbase, highgoDB, GaussDB, openGauss等,对于对于常用的“数字”对应多个datatype,增加了转换的概率,之前写过一篇<HighGoDB 对于number数据类型在PostgreSQL模式时无法使用索引>, 这次我们看另一种情况。近期在一套oracle迁移到某国产postgresql系数据库后,之前一个正常高频执行的SQL把数据库CPU瞬间拉到了70%, 如 numval>=power(10,19) 未在索引列使用索引,下面记录这一问题。

在重温PostgreSQL的数字类型

NameStorage SizeDescriptionRange
smallint2 bytessmall-range integer-32768 to +32767
integer4 bytestypical choice for integer-2147483648 to +2147483647
bigint8 byteslarge-range integer-9223372036854775808 to +9223372036854775807
decimalvariableuser-specified precision, exactup to 131072 digits before the decimal point; up to 16383 digits after the decimal point
numericvariableuser-specified precision, exactup to 131072 digits before the decimal point; up to 16383 digits after the decimal point
real4 bytesvariable-precision, inexact6 decimal digits precision
double precision8 bytesvariable-precision, inexact15 decimal digits precision
smallserial2 bytessmall autoincrementing integer1 to 32767
serial4 bytesautoincrementing integer1 to 2147483647
bigserial8 byteslarge autoincrementing integer1 to 9223372036854775807

创建测试用例(这里环境使用kingbase)

kingbase=# create table test1 (id int, name varchar(100));
CREATE TABLE

kingbase=# insert into test1 select x,'anbob'||x from generate_series(1,100000) as x;
INSERT 0 100000

kingbase=# create index idx_test1_id on test1(id);
CREATE INDEX

kingbase=# analyze  test1;
ANALYZE

kingbase=# explain analyze select * from test1 where id=100;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_test1_id on test1  (cost=0.29..8.31 rows=1 width=14) (actual time=0.024..0.025 rows=1 loops=1)
   Index Cond: (id = 100)
 Planning Time: 0.211 ms
 Execution Time: 0.047 ms
(4 rows)

kingbase=# explain analyze select * from test1 where id=power(10,2);
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on test1  (cost=0.00..2044.00 rows=500 width=14) (actual time=0.025..12.347 rows=1 loops=1)
   Filter: ((id)::double precision = '100'::double precision)
   Rows Removed by Filter: 99999
 Planning Time: 0.099 ms
 Execution Time: 12.366 ms
(5 rows)

Note: (id)::double precision 表明,即使 id 是 integer 类型,数据库也必须将其转换为浮点数才能与右边比较。这是索引失效的原因。

kingbase=# select pg_typeof(power(10,2)),power(10,2) ;
    pg_typeof     | power
------------------+-------
 double precision |   100
(1 row)

kingbase=# \df power
                                 List of functions
   Schema   | Name  | Result data type |        Argument data types         | Type
------------+-------+------------------+------------------------------------+------
 pg_catalog | power | double precision | double precision, double precision | func
 pg_catalog | power | numeric          | numeric, numeric                   | func
(2 rows)

Note: power函数 返回的是numeric或double precision数据类型,而表定义我们使用的是Int.

kingbase=# explain analyze select * from test1 where id=power(10::numeric,2::numeric);
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on test1  (cost=0.00..2044.00 rows=500 width=14) (actual time=0.038..20.705 rows=1 loops=1)
   Filter: ((id)::numeric = 100.0000000000000000)
   Rows Removed by Filter: 99999
 Planning Time: 0.081 ms
 Execution Time: 20.726 ms
(5 rows)

kingbase=# explain analyze select * from test1 where id=(10^2);
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on test1  (cost=0.00..2044.00 rows=500 width=14) (actual time=0.045..15.270 rows=1 loops=1)
   Filter: ((id)::double precision = '100'::double precision)
   Rows Removed by Filter: 99999
 Planning Time: 0.122 ms
 Execution Time: 15.295 ms
(5 rows)


kingbase=# explain analyze select * from test1 where id=power(10,2)::int;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_test1_id on test1  (cost=0.29..8.31 rows=1 width=14) (actual time=0.036..0.038 rows=1 loops=1)
   Index Cond: (id = 100)
 Planning Time: 0.110 ms
 Execution Time: 0.056 ms
(4 rows)

kingbase=# explain analyze select * from test1 where id=power(10::numeric,2::numeric);
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on test1  (cost=0.00..2044.00 rows=500 width=14) (actual time=0.038..20.705 rows=1 loops=1)
   Filter: ((id)::numeric = 100.0000000000000000)
   Rows Removed by Filter: 99999
 Planning Time: 0.081 ms
 Execution Time: 20.726 ms
(5 rows)

kingbase=# explain analyze select * from test1 where id=(10^2);
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on test1  (cost=0.00..2044.00 rows=500 width=14) (actual time=0.045..15.270 rows=1 loops=1)
   Filter: ((id)::double precision = '100'::double precision)
   Rows Removed by Filter: 99999
 Planning Time: 0.122 ms
 Execution Time: 15.295 ms
(5 rows)


kingbase=# explain analyze select * from test1 where id=power(10,2)::int;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_test1_id on test1  (cost=0.29..8.31 rows=1 width=14) (actual time=0.036..0.038 rows=1 loops=1)
   Index Cond: (id = 100)
 Planning Time: 0.110 ms
 Execution Time: 0.056 ms
(4 rows)

kingbase=# explain analyze select * from test1 where id=(10^2)::int;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_test1_id on test1  (cost=0.29..8.31 rows=1 width=14) (actual time=0.022..0.024 rows=1 loops=1)
   Index Cond: (id = 100)
 Planning Time: 0.103 ms
 Execution Time: 0.062 ms
(4 rows)

Note: 这种强制转换的方法也可以使用索引.

kingbase=# alter table test1 alter column id type numeric(20,0);
ALTER TABLE
kingbase=# \d test1
                       Table "public.test1"
 Column |          Type          | Collation | Nullable | Default
--------+------------------------+-----------+----------+---------
 id     | numeric(20,0)          |           |          |
 name   | character varying(100) |           |          |
Indexes:
    "idx_test1_id" btree (id)

kingbase=# explain analyze select * from test1 where id>power(10::numeric,9);
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_test1_id on test1  (cost=0.29..4.31 rows=1 width=16) (actual time=0.010..0.010 rows=0 loops=1)
   Index Cond: (id > 1000000000.0000000000000000)
 Planning Time: 0.224 ms
 Execution Time: 0.031 ms
(4 rows)

Note: 修改表与SQL条件类型一致,也可以避免隐式转换。 但是修改表类型有int到double,会因为数据类型长度占用更多的空间,还有可能会导致其他SQL再次转换。

kingbase=# explain analyze select * from test1 where id=power(10,19)::int;
ERROR:  integer out of range

kingbase=# explain analyze select * from test1 where id=power(10,19)::bigint;
ERROR:  bigint out of range

Note: 但是应用中的power(10,19) 超出了int的范围, 实际中数据也没有这么大的值,所以提醒研发在使用pg后开发不要太任性,同时oracle的number类型转移pg对应什么类型建议和应用统筹考虑,国产化改造并不是简单的数据迁移

— over —

Leave a Comment