当创建PostgreSQL数据库时,可以根据业务需要配置不同数据库的collation , collation影响包括字符串排序顺序、字符分类方法、数值格式、日期和时间格式以及货币格式。 此外,您可能还需要配置 LC_COLLATE 和 LC_CTYPE 环境变量。 在创建库、创建表、创建索引、查询时也可以指定该collation,不同的collation可能会产生不同的排序,甚至导致无法使用索引。
postgresql支持的字符集
https://www.postgresql.org/docs/15/multibyte.html Server列为YES支持服务端,否则仅支持客户端。
当前字符库支持的LC_COLLATE, LC_CTYPE
LC_COLLATE, LC_CTYPE当不指定时默认的,但要与当煎数据库的字符集兼容, 您可以执行以下 SQL 语句从 pg_collation 系统表中查询字符集支持的 LC_COLLATE 和 LC_CTYPE 设置:
[local]:5432 postgres@db1=# select distinct pg_encoding_to_char(collencoding) from pg_collation;
pg_encoding_to_char
---------------------
EUC_CN
LATIN7
ISO_8859_7
WIN1251
ISO_8859_6
LATIN3
LATIN6
LATIN5
ISO_8859_8
LATIN9
LATIN1
LATIN2
EUC_TW
UTF8
EUC_KR
KOI8U
KOI8R
ISO_8859_5
WIN1255
LATIN8
EUC_JP
(22 rows)
[local]:5432 postgres@db1=# select pg_encoding_to_char(collencoding) as encoding,collname,collcollate,collctype, collencoding from pg_collation where pg_encoding_to_char(collencoding) like '%EUC_CN%';
encoding | collname | collcollate | collctype | collencoding
----------+--------------+--------------+--------------+--------------
EUC_CN | zh_CN | zh_CN | zh_CN | 2
EUC_CN | zh_CN.gb2312 | zh_CN.gb2312 | zh_CN.gb2312 | 2
EUC_CN | zh_SG | zh_SG | zh_SG | 2
EUC_CN | zh_SG.gb2312 | zh_SG.gb2312 | zh_SG.gb2312 | 2
(4 rows)
[local]:5432 postgres@db1=# select pg_encoding_to_char(collencoding) as encoding,collname,collcollate,collctype from pg_collation ;
encoding | collname | collcollate | collctype
------------+-----------------------+-----------------------+-----------------------
| default | |
| C | C | C
| POSIX | POSIX | POSIX
UTF8 | ucs_basic | C | C
LATIN1 | aa_DJ | aa_DJ | aa_DJ
LATIN1 | aa_DJ.iso88591 | aa_DJ.iso88591 | aa_DJ.iso88591
UTF8 | aa_DJ.utf8 | aa_DJ.utf8 | aa_DJ.utf8
UTF8 | aa_ER | aa_ER | aa_ER
UTF8 | aa_ER.utf8 | aa_ER.utf8 | aa_ER.utf8
...
Note:
ENCODING是null 表示所有字符集都支持.
创建数据库时指定COLLATE
CREATE DATABASE hrdb WITH ENCODING 'UTF8' LC_COLLATE='C.UTF-8' LC_CTYPE='C.UTF-8' TEMPLATE = template0 CONNECTION LIMIT = 200 --IS_TEMPLATE = TRUE;
note:
一但创建不支持alter database修改。建议 LC_COLLATE使用“C”,性能上有些提升.
创建表时指定COLLATE
CREATE TABLE test1 (
a text COLLATE "de_DE",
b text COLLATE "es_ES",
...
);
note:
后期可以修改列 collate,如 alter table a alter c1 type text COLLATE “zh_CN”;
创建索引时指定COLLATE
CREATE INDEX testc_name_idx3 ON testc(name COLLATE "C")
Note:
可以在同一列创建不同collate索引。
查询时指定collate
SELECT * FROM t WHERE name >= 'z' COLLATE "C"; create index idxa on a(c1 collate "zh_CN"); explain select * from a order by c1 collate "zh_CN";
Note:
如果查询使用的collate与索引collate不同,可能无法使用索引.
当前locale
[postgres@oel7db1 ~]$ env|grep LAN LANG=en_US.UTF-8 [postgres@oel7db1 ~]$ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
查看数据库的locale
[local]:5432 postgres@db1=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+-----------+------------+------------+-----------------------
anbob | postgres | SQL_ASCII | C | C |
db | postgres | SQL_ASCII | C | C |
db1 | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 |
postgres | postgres | SQL_ASCII | C | C |
template0 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
(6 rows)
collate对排序的影响
[local]:5432 postgres@db1=# select * from (values ('Alice'),('Tom'),('anbob')) as a(c1) order by c1 ;
c1
-------
Alice
anbob
Tom
(3 rows)
[local]:5432 postgres@db1=# select * from (values ('Alice'),('Tom'),('anbob')) as a(c1) order by c1 collate "C";
c1
-------
Alice
Tom
anbob
(3 rows)
[local]:5432 postgres@db1=# select * from (values ('Alice'),('Tom'),('anbob')) as a(c1) order by c1 collate "zh_CN";
c1
-------
Alice
anbob
Tom
(3 rows)
collate对索引的影响
[local]:5432 postgres@postgres=# CREATE DATABASE "example_db"
postgres-# WITH OWNER "postgres"
postgres-# ENCODING 'UTF8'
postgres-# LC_COLLATE = 'en_US.UTF8'
postgres-# LC_CTYPE = 'en_US.UTF8'
postgres-# TEMPLATE template0;
CREATE DATABASE
create table testc(id int,name varchar(20),addr varchar(3000),otype char(1));
insert into testc select x,x||'anbob',rpad('x',2000,'x'),'t' from generate_series(1,10000) as x;
[local]:5432 postgres@example_db=# create index on testc(name);
CREATE INDEX
[local]:5432 postgres@example_db=# \d testc
Table "public.testc"
Column | Type | Collation | Nullable | Default
--------+-------------------------+-----------+----------+---------
id | integer | | |
name | character varying(20) | | |
addr | character varying(3000) | | |
otype | character(1) | | |
Indexes:
"testc_name_idx" btree (name)
[local]:5432 postgres@example_db=# analyze testc;
ANALYZE
[local]:5432 postgres@example_db=# explain select * from testc where name like '1a%';
QUERY PLAN
--------------------------------------------------------
Seq Scan on testc (cost=0.00..229.00 rows=1 width=50)
Filter: ((name)::text ~~ '1a%'::text)
(2 rows)
NOTE:
使用全表扫,即使选择率很好,可以创建xx_pattern_ops索引.
创建xx_pattern_ops索引
[local]:5432 postgres@example_db=# CREATE INDEX testc_name_idx2 ON testc(name varchar_pattern_ops);
CREATE INDEX
[local]:5432 postgres@example_db=# \l testc
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
------+-------+----------+---------+-------+-------------------
(0 rows)
[local]:5432 postgres@example_db=# \d testc
Table "public.testc"
Column | Type | Collation | Nullable | Default
--------+-------------------------+-----------+----------+---------
id | integer | | |
name | character varying(20) | | |
addr | character varying(3000) | | |
otype | character(1) | | |
Indexes:
"testc_name_idx" btree (name)
"testc_name_idx2" btree (name varchar_pattern_ops)
[local]:5432 postgres@example_db=# explain select * from testc where name like '1a%';
QUERY PLAN
----------------------------------------------------------------------------------
Index Scan using testc_name_idx2 on testc (cost=0.29..8.31 rows=1 width=50)
Index Cond: (((name)::text ~>=~ '1a'::text) AND ((name)::text ~<~ '1b'::text))
Filter: ((name)::text ~~ '1a%'::text)
(3 rows)
也可以创建COLLATE “C”索引
drop index testc_name_idx2;
CREATE INDEX testc_name_idx3 ON testc(name COLLATE "C")
[local]:5432 postgres@example_db=# explain select * from testc where name like '1a%';
QUERY PLAN
------------------------------------------------------------------------------
Index Scan using testc_name_idx3 on testc (cost=0.29..8.31 rows=1 width=50)
Index Cond: (((name)::text >= '1a'::text) AND ((name)::text < '1b'::text))
Filter: ((name)::text ~~ '1a%'::text)
(3 rows)
xx_pattern_ops and COLLATE “C”不同?
CREATE INDEX ON t(name text/varchar_pattern_ops) 和CREATE INDEX ON t(name COLLATE “C”)都可以使用索引,请注意,如果您希望涉及普通 <、<=、> 或 >= 比较的查询使用索引,您还应该使用默认运算符类创建索引。 此类查询不能使用 xxx_pattern_ops 运算符类。 (但是,普通的相等比较可以使用这些运算符类。)可以使用不同的运算符类在同一列上创建多个索引。 如果您确实使用 C 语言环境,则不需要 xxx_pattern_ops 运算符类,因为具有默认运算符类的索引可用于 C 语言环境中的模式匹配查询。
使用Collate=C的DB
[local]:5432 postgres@example_db=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
------------+----------+-----------+------------+------------+-----------------------
anbob | postgres | SQL_ASCII | C | C |
db | postgres | SQL_ASCII | C | C |
example_db | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 |
postgres | postgres | SQL_ASCII | C | C |
template0 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
(6 rows)
[local]:5432 postgres@example_db=# \c anbob
You are now connected to database "anbob" as user "postgres".
[local]:5432 postgres@anbob=# create table testc(id int,name varchar(20),addr varchar(3000),otype char(1));
CREATE TABLE
[local]:5432 postgres@anbob=# insert into testc select x,x||'anbob',rpad('x',2000,'x'),'t' from generate_series(1,10000) as x;
INSERT 0 10000
[local]:5432 postgres@anbob=# create index on testc(name);
CREATE INDEX
[local]:5432 postgres@anbob=# \d testc
Table "public.testc"
Column | Type | Collation | Nullable | Default
--------+-------------------------+-----------+----------+---------
id | integer | | |
name | character varying(20) | | |
addr | character varying(3000) | | |
otype | character(1) | | |
Indexes:
"testc_name_idx" btree (name)
[local]:5432 postgres@anbob=# explain select * from testc where name like '1a%';
QUERY PLAN
------------------------------------------------------------------------------
Index Scan using testc_name_idx on testc (cost=0.29..9.66 rows=1 width=50)
Index Cond: (((name)::text >= '1a'::text) AND ((name)::text < '1b'::text))
Filter: ((name)::text ~~ '1a%'::text)
(3 rows)
Note:
默认不需要指定collate和创建xxx_pattern_ops索引 。
— over —