Oracle迁移到PostgreSQL注意事项: to_date

首页 » ORACLE 9i-23ai, PostgreSQL/GaussDB » Oracle迁移到PostgreSQL注意事项: to_date

Oracle迁移到PostgreSQL注意事项: to_date

2023/01/05
ORACLE 9i-23ai, PostgreSQL/GaussDB
307 views
Oracle迁移到PostgreSQL注意事项: to_date已关闭评论

近年来国家xc的推广，现阶段国产或开源库迁移逐渐列上日程，目前看已大势所趋，在选择部分国产(基于)或开源数据库里安全健康的开源协议需要考虑， PostgreSQL是功能强大最接近Oracle的并有全球客户经过长期磨炼过生态健全的关系型数据库，尤其是在俄乌冲突后，西方部分国产在对俄国的技术遏制后，并没有俄国的本土企业PostgresqlPro的数据库业务造成影响，表明了PG开源社区的安全，当然现在国内一些政客领导在XC时对数据库选型一刀切，国外开源统统不用，而选择一个套皮，个人感觉对后期持续迭代表示怀疑。 Oracle的应用向PG的改造对于语法兼容但语议变化可能会导致一些问题。

最近有一个项目在原有一套Oracle的应用数据迁移到PG后，在两个PG环境中双数据源双写数据时，其中一套PG的数据库入库存在问题，下面简单记录。

现象是每5分钟自动调一批insert进程，一批次有1张表写入约14万记录，两套PG库测试库A，准生产B，执行简单的SQL伪代码 insert into table1(c1,c2,c3) values(‘2023-1-2 01:05:00’, ‘123’, now()); c1和c2是组合唯一索引，数据来自本地静态文件， c3是入库时间取自当前时间， c1是批次时间来自文件， c2是id列。如果按C1列group by统计查询A库数据符合预期，每批14万记录，但是有意思的现象是B库每天的第1批次数据正常，但从第2批次开始后每批次仅为2000条，期间人为把该当天的分区truncate后，随即当天后来的数据又都可以正常入库每批次14万。但第二天该现象又重复出现。期间为了排除B库是唯一索引导致，把B库的唯一索引改成普通索引后问题并未解决。table定义是完全一样。这种问题你会怎么分析？应用程序是同一套应用，只是双写其中的一套库有问题。

分析应用日志
发现B库的应用日志中有在insert如下报错，每批次有个报错，但如果应用是大事务批量提交，其中事务中有一个报错，在oracle和postgresql中形为是不一样的，oracle中是同一事务中之前insert未报错的，最后commit后是有效的，而在postgresql同一事务中遇到一条报错之前insert未报错的，最后commit时也会全事务rollback.
ERROR: duplicate key value violates unique constraint "idx_xxx"

当然上面有提到为了排除是唯一索引引起，我们把唯一索引已经改提了，后台没有报该误后数据依旧提未错误。这时我们从应用了解到应用入库的逻辑中是先做一次select后如果不存在再做insert.

分析如何判断数据已存在
一种方法是看应用代码，当然应用认为肯定自己没问题，问题出在数据库端，此时也没有了耐心再多次反复测试，另一种也不麻烦在数据库开启SQL跟踪，记录数据库所有前台sql到日志。我们选择了后者。相关的有2个参数log_min_duration_statement和log_statement：
log_min_duration_statement —–Setting this to 0 prints all statement durations.
log_statement —— Controls which SQL statements are logged. Valid values are none (off), ddl, mod, and all (all statements).

alter system set log_statement= 'all';
SELECT pg_reload_conf();

grep 日志中所有和表名相关的select SQL, 这个业务逻辑并不复杂，很快能过滤出那条select 如下：
伪代码

select c2 from table1 where c2 in (xxx,xxx,xxx) and  c1=to_date('2022-12-30 14:15:00','yyyy-mm-dd hh24:mi:ss')

Note:
逻辑就是看一个id表示列和时间(当前批次的时间)。如果用当前的SQL去A库和B库查，确实时返回不同的结果，如果你是PG DBA可能已经想到了原因，但是Oracle DBA可能还想不通，

然后我们去两个库查询一下TO_DATE看看返回结果。

A库
anbob=# select to_date('2023-1-1 12:01:01','yyyy-mm-dd hh24:mi:ss') from dual;
       to_date
---------------------
 2023-01-01 12:01:01

[local]:5432 postgres@anbob=# \df to_date
                        List of functions
 Schema |  Name   | Result data type | Argument data types | Type
--------+---------+------------------+---------------------+------
 oracle | to_date | date             | text                | func
 oracle | to_date | date             | text, text          | func
(2 rows)

[local]:5432 postgres@postgres=# \dx
                                                       List of installed extensions
        Name        | Version |   Schema   |                                          Description
--------------------+---------+------------+-----------------------------------------------------------------------------------------------
 orafce             | 4.0     | public     | Functions and operators that emulate a subset of functions and packages from the Oracle RDBMS
 pg_stat_statements | 1.8     | public     | track planning and execution statistics of all SQL statements executed
 plpgsql            | 1.0     | pg_catalog | PL/pgSQL procedural language
(3 rows)


B库
[local]:5432 postgres@anbob=# select to_date('2023-1-1 12:01:01','yyyy-mm-dd hh24:mi:ss') from dual;
  to_date
------------
 2023-01-01

[local]:5432 postgres@anbob=# \df to_date
                                List of functions
   Schema   |  Name   |      Result data type       | Argument data types | Type
------------+---------+-----------------------------+---------------------+------
 pg_catalog | to_date | timestamp without time zone | str text            | func
 pg_catalog | to_date | date                        | text, text          | func
(2 rows)

Note:
这就是原因，A库有安装orafce所以to_date是兼容了oracle的to_date函数，范围的是带时分秒，而Postgresql自带的to_date返回的没有时分秒，date在oracle中相于当postgresql中的date+time。当然也是因为两个库的安装没有规范化才产生的差异。回头看业务入库问题，B库正因为每天的第1批相当于00:00:00数据时可以入库，但是从第二批次开始，虽然时间是带了时分秒，但是SQL的to_date只能精确到00:00:00第1批次的数据，所以会判断数据存在导致大部分数据未insert, 而truncate后就能入库原因是因为删除了当天的第1批数据。

下面看一下Openguass 中的to_date.

anbob=# \df to_date
                                                   List of functions
   Schema   |  Name   |      Result data type       | Argument data types |  Type  | fencedmode | propackage | prokind
------------+---------+-----------------------------+---------------------+--------+------------+------------+---------
 pg_catalog | to_date | timestamp without time zone | text                | normal | f          | f          | f
 pg_catalog | to_date | timestamp without time zone | text, text          | normal | f          | f          | f


SELECT n.nspname AS schema_name
      ,p.proname AS function_name
      ,pg_get_functiondef(p.oid) AS func_def
      ,pg_get_function_arguments(p.oid) AS args
      ,pg_get_function_result(p.oid) AS result
FROM   pg_proc p
JOIN   pg_namespace n ON n.oid = p.pronamespace
anbob-# WHERE  p.proname ILIKE 'to_date%';
 schema_name | function_name |                           func_def                            |    args    |           result
-------------+---------------+---------------------------------------------------------------+------------+-----------------------------
 pg_catalog  | to_date       | (4,"CREATE OR REPLACE FUNCTION pg_catalog.to_date(text, text)+| text, text | timestamp without time zone
             |               |  RETURNS timestamp without time zone                         +|            |
             |               |  LANGUAGE internal                                           +|            |
             |               |  STABLE STRICT NOT FENCED SHIPPABLE                          +|            |
             |               | AS $function$to_timestamp$function$;                         +|            |
             |               | ")                                                            |            |
 pg_catalog  | to_date       | (4,"CREATE OR REPLACE FUNCTION pg_catalog.to_date(text)      +| text       | timestamp without time zone
             |               |  RETURNS timestamp without time zone                         +|            |
             |               |  LANGUAGE internal                                           +|            |
             |               |  STABLE STRICT NOT FENCED SHIPPABLE                          +|            |
             |               | AS $function$to_date_default_format$function$;               +|            |
             |               | ")                                                            |            |
(2 rows)

-- or --

anbob=# \sf to_date(text,text)
CREATE OR REPLACE FUNCTION pg_catalog.to_date(text, text)
 RETURNS timestamp without time zone
 LANGUAGE internal
 STABLE STRICT NOT FENCED SHIPPABLE
AS $function$to_timestamp$function$;
anbob=#

Note:
当前是兼容orale模式，没有引入oracle schema，而是重写了pg_catalog schema中的to_date, 返回的是timestamp without time zone。 orafce中to_date返回的是date类型为什么是带十分秒呢？下面研究一下orafce的to-date.

orafce中的to_date

Orafce安装

1， orafce源码：https://github.com/orafce/orafce
2，编译安装：
解压后进入源码目录执行 make & make install
3， 进入postgresql执行 create extension orafce
-- 注意需要更改 时区，

当前的to_date 还没有oracle schema,需要配置search_path
[local]:5432 postgres@anbob=# \df to_date
                                List of functions
   Schema   |  Name   |      Result data type       | Argument data types | Type
------------+---------+-----------------------------+---------------------+------
 pg_catalog | to_date | timestamp without time zone | str text            | func
 pg_catalog | to_date | date                        | text, text          | func
(2 rows)

[local]:5432 postgres@anbob=# select to_date('2023-1-1 12:01:01','yyyy-mm-dd hh24:mi:ss') from dual;
  to_date
------------
 2023-01-01

[local]:5432 postgres@postgres=# set search_path="$user", oracle, public;
SET
[local]:5432 postgres@postgres=# \df to_date
                          List of functions
   Schema   |  Name   | Result data type | Argument data types | Type
------------+---------+------------------+---------------------+------
 oracle     | to_date | oracle.date      | text                | func
 pg_catalog | to_date | date             | text, text          | func
(2 rows)

[local]:5432 postgres@postgres=# select to_date('2023-1-1 12:01:01','yyyy-mm-dd hh24:mi:ss') from dual;
  to_date
------------
 2023-01-01
(1 row)

Note:
在pg中的函数唯一性是函数名（参数），所以只加oracle到search_path, to_date(text,text) 还是pg 自带的pg_catalog schema中的。需要显示指定search_path， oracle优先pg_catalog

[local]:5432 postgres@postgres=# set search_path="$user", oracle, public, pg_catalog;
SET
[local]:5432 postgres@postgres=# \df to_date
                        List of functions
 Schema |  Name   | Result data type | Argument data types | Type
--------+---------+------------------+---------------------+------
 oracle | to_date | date             | text                | func
 oracle | to_date | date             | text, text          | func
(2 rows)

[local]:5432 postgres@postgres=# \sf to_date(text,text)
CREATE OR REPLACE FUNCTION oracle.to_date(text, text)
 RETURNS date
 LANGUAGE sql
 IMMUTABLE STRICT
AS $function$ SELECT TO_TIMESTAMP($1,$2)::oracle.date; $function$

Note:
orafce中oracle的to_date返回的不是自带的date，而是oracle.date数据类型。

[local]:5432 postgres@postgres=# select t.oid basetype_oid,ns.nspname,ns.oid,typname,typbasetype from pg_type t,pg_namespace ns where t.typnamespace=ns.oid and  (t.typname='date' or t.oid=1114) ;
 basetype_oid |  nspname   |  oid  |  typname  | typbasetype
--------------+------------+-------+-----------+-------------
         1082 | pg_catalog |    11 | date      |           0
         1114 | pg_catalog |    11 | timestamp |           0
        26796 | oracle     | 26732 | date      |        1114
(3 rows)

Note:
可见oracle.date实际是pg_catalog中的timestamp类型，所以可以带时分秒。

Summary:
总结一下这个案例，2个库有1个库无法入库，是因为有1个库没有安装orafce，如果都没安装，这个应用再没有唯一主键，如果不统计业务数据，日志中没有任何报错，这个坑可能会藏的比较深可能都无察觉，后期发现时数据已丢失。而根本原因是应用入库前有个SQL要判断数据是否存在，使用的to_date函数精确到时分秒，而pg中的to_date并不像oracle的to_date带时分秒，pg to_date调用时也不报错，而只是精确到了天，导致数据以为存在。 orafce的to_date需要配置search_path, 返回的oracle.date数据类型实际是pg_catalog的timestamp数据类型。

打赏

对不起，这篇文章暂时关闭评论。

上一篇：如何恢复Truncate sys.IDL_UB1$？

下一篇： Troubleshooting ORA-20 on ASM instance after ‘ntpdate’ change OS date

ANBOB™

Oracle迁移到PostgreSQL注意事项: to_date

对不起，这篇文章暂时关闭评论。

最新文章

标签云集

文章索引

MySql Link

ORACLE Link

Others Link

国内好友

管理功能

微信公众号/Wechat