对于每个数组元素,计算包含它的数组列的行数。

huangapple go评论56阅读模式
英文:

For each array element, count rows whose array column contains it

问题

在表格 users 中,有一个名为 user_ids 的列,其数据类型为 varchar[],并且有一个 GIN 索引。每个行中 user_ids 列中的 user_id 都是唯一的。

我想要计算包含输入列表中每个 user_id 的行数。

我尝试使用以下查询,但是 EXPLAIN 显示它没有使用 GIN 索引:

SELECT user_id, count(*) + 1 COUNT
FROM (
	SELECT unnest(user_ids) user_id
	FROM users 
	WHERE user_ids && ARRAY['1', '2', '3']::varchar[]
) sub
WHERE user_id = ANY(ARRAY['1', '2', '3']::varchar[])
GROUP BY user_id;

有没有办法在这里利用 GIN 索引?

英文:

In table users, there is a column user_ids varchar[] with GIN index. Each user_id of column user_ids in each row is unique.

I want to count the number of rows that contain user_id for each user_id in an input list.

I try using this query but EXPLAIN shows it doesn't use GIN index:

SELECT user_id, count(*) + 1 COUNT
FROM (
	SELECT unnest(user_ids) user_id
	FROM users 
	WHERE user_ids && ARRAY['1', '2', '3']::varchar[]
) sub
WHERE user_id = ANY(ARRAY['1', '2', '3']::varchar[])
GROUP BY user_id;

Is there any way to make use of GIN index here?

答案1

得分: 1

以下是翻译好的部分:

对于小型输入数组和行之间的有限重叠,这个查询应该尽可能快速:

SELECT i.user_id, u.*
FROM   unnest('{1, 2, 3}'::varchar[]) i(user_id)  -- your input here
CROSS  JOIN LATERAL (
   SELECT count(*) AS count
   FROM   users u
   WHERE  u.user_ids @> ARRAY[i.user_id]  -- !
   ) sub;

显著的差异:我的查询包括具有0匹配的输入元素,您的原始查询没有。

array操作符@>期望左右操作数都是数组类型(即使在这种情况下右侧只有一个元素)。

数组操作符可以在(user_ids)上使用GIN索引。您原始查询中的子查询也符合索引的使用条件!

请参阅:

为什么我的查询计划没有使用GIN索引?

也许Postgres决定选择不同的查询计划,因为过滤条件不够具有选择性,或者列统计信息已过时?

为了确定索引是否适用,使用以下本地设置运行测试(仅在您的会话中!):

SET enable_seqscan = off;

EXPLAIN ....

这会在可能的情况下强制执行无顺序扫描的计划。

英文:

For small input arrays and limited overlap between rows, this query should be as fast as it gets:

SELECT i.user_id, u.*
FROM   unnest('{1, 2, 3}'::varchar[]) i(user_id)  -- your input here
CROSS  JOIN LATERAL (
   SELECT count(*) AS count
   FROM   users u
   WHERE  u.user_ids @> ARRAY[i.user_id]  -- !
   ) sub;

Notable difference: my query includes elements of the input with 0 matches, your original does not.

The array operator @> expects array types as left and right operand (even if it's just a single element to the right like in this case).

Array operators can use a GIN index on (user_ids). The subquery in your original query also qualifies for index usage!

See:

Why does my query plan not use the GIN index?

Maybe Postgres decides for a different query plan because the filter is not selective enough, or column statistics are outdated?

To determine whether the index is applicable at all, run a test with this local setting (only in your session!):

SET enable_seqscan = off;

EXPLAIN ....

This forces a plan without sequential scan if at all possible.

huangapple
  • 本文由 发表于 2023年5月11日 19:11:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76226976.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定