PostgreSQL不区分大小写的GIN索引?

huangapple go评论52阅读模式
英文:

PostgreSQL case-insensitive GIN indexing?

问题

当我们有一个单个字符串列时,我们可以使用 upper() 创建一个基于函数的索引 create index on tab(upper(col)) 来建立索引,当查询是 select * from tab where upper(col) = ? 时,它将使用该索引。好处是数据以原始混合大小写的方式检索,但查找是不区分大小写的。

当列 col 的类型是 text[] 时,我们可以使用 GIN 创建一个反向索引 create index tab(col) using GIN(col),然后 select * from tab where col && ? 将查看参数数组与列数组是否有重叠。

但如果我也希望这是不区分大小写的呢?当然,我可以将列规范化为全部大写,但那样我就失去了原始数据。或者如果我将列复制两次,一个是规范化的,一个是原始的,那也可以工作。

我有一个想法,怎么样:

mydb=> create table tab(col text[]);
CREATE TABLE
mydb=> insert into tab values ('{Ay,bee,Cee}'::text[]);
INSERT 0 1
mydb=> select * from tab;
     col
--------------
 {Ay,bee,Cee}
(1 row)

mydb => select *, (select array_agg(upper(n)) from unnest(col) as n) from tab;
     col      |  array_agg
--------------+--------------
 {Ay,bee,Cee} | {AY,BEE,CEE}
(1 row)

如果这能够正常工作,我可能可以在 norm 列上创建 GIN 索引。但遗憾的是,它没有。有其他的解决方案吗?

英文:

When we have a single string column, we can build a function-based index using upper() create index on tab(upper(col)) to build the index and when the query is select * from tab where upper(col) = ? it will use that index. The benefit is that the data stays in original mixed case as it is retrieved but the lookup is done case insensitive.

When col is of type text[] we can have an inverted index with GIN create index tab(col) using GIN(col) and then select * from tab where col && ? will see if there is an overlap of the parameter array with the col array.

But what if I want that to be case insensitive too? I could of course normalize the col into all upper case, but then I lose the original data. Or if I have the column twice, as normalized and as original, it can work.

I have an idea, how about:

mydb=> create table tab(col text[]);
CREATE TABLE
mydb=> insert into tab values ('{Ay,bee,Cee}'::text[]);
INSERT 0 1
mydb=> select * from tab;
     col
--------------
 {Ay,bee,Cee}
(1 row)

mydb => select *, (select array_agg(upper(n)) from unnest(col) as n) from tab;
     col      |  array_agg
--------------+--------------
 {Ay,bee,Cee} | {AY,BEE,CEE}
(1 row)

mydb=> alter table tab add column norm text[] generated always as (select array_agg(upper(n)) from unnest(col) as n);
ERROR:  syntax error at or near "select"
LINE 1: ...e tab add column norm text[] generated always as (select arr...
                                                             ^

If that had worked, I could probably have created the GIN index on the norm column. But alas, it didn't. Any other solution?

答案1

得分: 3

需要将子查询包装成一个辅助函数:

create or replace function upper_array(text[]) returns text[] language sql as $$
   select array_agg(upper(unnest)) from unnest($1)
$$ immutable parallel safe;

然后,而不是添加占用额外空间的生成列,我可能会直接在函数上定义索引:

create index on tab using gin (upper_array(col));
英文:

You need to wrap your subquery into a helper function:

create or replace function upper_array(text[]) returns text[] language sql as $$
   select array_agg(upper(unnest)) from unnest($1)
$$ immutable parallel safe;

Then rather than add a generated column which takes extra space, I'd probably just define the index over the function directly:

create index on tab using gin (upper_array(col));

huangapple
  • 本文由 发表于 2023年5月30日 07:51:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76360870.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定