最佳方式索引大文本的文本数组是什么?

huangapple go评论60阅读模式
英文:

What's the best way to index a text array with large text?

问题

我有一些任意大小的文本数组,并且我想要为contains查找创建索引。

看起来数组中的文本对于传统索引来说太大了。

这也是一个相当大的表 - 它有数十亿行。

使用标准的GIN索引会给我带来错误:

ERROR: index row size 6648 exceeds maximum 2712 for index "index"

在查找后,看起来GIN默认使用BTREE,这可能不适用于这些大列的情况。

有什么好的替代索引可以使用,而不必借助复杂的类型转换吗?(一个简单的索引可能就足够了)。

英文:

I have some arbitrarily-sized text in a text array, and I would like to index it for contains lookups.

It looks like the text in the array is too large for a traditional index.

This is also quite a large table - it has a few billion rows.

Using a standard GIN index gives me the error:

ERROR:  index row size 6648 exceeds maximum 2712 for index "index"

After looking it up, it looks like GIN defaults to using BTREE which is probably not the right thing to use for these types of huge columns.

What's a good alternate index I can use without having to resort to complex type conversion? (A simple one could do).

答案1

得分: 1

以下是翻译好的部分:

我将不对字符串建立索引,而是对其进行哈希处理。为此,请创建一个如下的函数:

CREATE FUNCTION hashtestarray(text[]) RETURNS integer[]
IMMUTABLE PARALLEL SAFE
BEGIN ATOMIC
   SELECT array_agg(hashtext(t)) FROM unnest($1) AS a(t);
END;

这将应用 text 哈希函数到数组的每个元素上。

然后建立索引:

CREATE INDEX ON tab USING gin (hashtestarray(arraycol));

并进行搜索:

SELECT ... FROM tab
WHERE hashtestarray(arraycol) @> hashtext('searchstring')
  AND arraycol @> 'searchstring':

第一个条件可以使用索引,而第二个条件将排除误报。

英文:

I would not index the strings, but hashes thereof. For that, create a function like

CREATE FUNCTION hashtestarray(text[]) RETURNS integer[]
IMMUTABLE PARALLEL SAFE
BEGIN ATOMIC
   SELECT array_agg(hashtext(t)) FROM unnest($1) AS a(t);
END;

That applies the text hash function to each element of the array.

Then index like

CREATE INDEX ON tab USING gin (hashtestarray(arraycol));

and search like

SELECT ... FROM tab
WHERE hashtestarray(arraycol) @> hashtext('searchstring')
  AND arraycol @> 'searchstring':

The first condition can use the index, and the second condition will remove false positives.

huangapple
  • 本文由 发表于 2023年5月13日 13:26:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76241217.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定