如何在Postgres中为涉及TEXT列和DATE列的表达式创建索引,这些列已连接?

huangapple go评论57阅读模式
英文:

How to create an index in Postgres on an expression involving TEXT columns and a DATE column, concatenated?

问题

我想能够搜索与我在数据库中拥有的内容“相似”的人(例如,在他们的姓名或出生日期中有错别字的人)。我想要以以下方式使用levenshtein函数:

create temp table person (first_name text, last_name text, dob date);
select * from person where levenshtein('johndoe2000-01-01', lower(first_name || last_name || dob)) < 4;

(我知道这可能可以通过使用YYYYMMDD而不是YYYY-MM-DD来进行优化,但这与我的问题无关。)

我认为可以通过索引加速搜索。然而,我不知道如何创建一个能够工作的索引:

create index fuzzy on person ((lower(first_name || last_name || dob)));

结果显示ERROR: functions in index expression must be marked IMMUTABLE。我尝试了一些方法:使用concat代替||,显式或使用to_chardob强制转换为text,但都无济于事。我做错了什么?问题似乎源于混合使用textdate - 我可以创建只包含text列或只包含date列的索引而没有问题。

英文:

I want to be able to search for people "similar" to what I have in the database (e.g. a person with a typo in their name or date of birth). I want to use the levenshtein function for that in the following way:

create temp table person (first_name text, last_name text, dob date);
select * from person where levenshtein(&#39;johndoe2000-01-01&#39;, lower(first_name || last_name || dob)) &lt; 4;

(I am aware this can be probably optimized a bit by e.g. using YYYYMMDD instead of YYYY-MM-DD, but that is irrelevant to my question.)

I thought that I could speed up the search with an index. However, I don't know how to create one that would work:

create index fuzzy on person ((lower(first_name || last_name || dob)));

results in ERROR: functions in index expression must be marked IMMUTABLE. I tried a few things: using concat instead of ||, casting dob to text explicitly or with to_char, but to no avail. What am I doing wrong? It seems that the problem results from mixing text and date – I can create an index with only text columns or only the date column without a problem.

答案1

得分: 1

这个练习没有意义:你永远无法使用索引来加速使用 levenshtein() 的查询。

英文:

The exercise is pointless: you can never use an index to speed up a query that uses levenshtein().

答案2

得分: 0

将日期转换为字符串取决于可能会更改的数据库设置,因此它不是不可变的。

select '2021-04-03'::date::text;
2021-04-03
set datestyle TO 'german';
select '2021-04-03'::date::text;
03.04.2021

您可以以多种方式解决这个问题。但一旦解决,它仍然无法作为Levenshtein距离通常无法建立索引。

英文:

Casting dates to strings depends on database setting that might change, and so it is not immutable.

select &#39;2021-04-03&#39;::date::text;
 2021-04-03
set datestyle TO &#39;german&#39;;
select &#39;2021-04-03&#39;::date::text;
 03.04.2021

You could solve that problem in a variety of ways. But once you do, it still wouldn't work as levenshtein distance is generally not indexable.

huangapple
  • 本文由 发表于 2023年6月27日 17:54:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76563673.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定