英文:
For each array element, count rows whose array column contains it
问题
在表格 users
中,有一个名为 user_ids
的列,其数据类型为 varchar[]
,并且有一个 GIN 索引。每个行中 user_ids
列中的 user_id
都是唯一的。
我想要计算包含输入列表中每个 user_id
的行数。
我尝试使用以下查询,但是 EXPLAIN
显示它没有使用 GIN 索引:
SELECT user_id, count(*) + 1 COUNT
FROM (
SELECT unnest(user_ids) user_id
FROM users
WHERE user_ids && ARRAY['1', '2', '3']::varchar[]
) sub
WHERE user_id = ANY(ARRAY['1', '2', '3']::varchar[])
GROUP BY user_id;
有没有办法在这里利用 GIN 索引?
英文:
In table users
, there is a column user_ids varchar[]
with GIN index. Each user_id
of column user_ids
in each row is unique.
I want to count the number of rows that contain user_id
for each user_id
in an input list.
I try using this query but EXPLAIN
shows it doesn't use GIN index:
SELECT user_id, count(*) + 1 COUNT
FROM (
SELECT unnest(user_ids) user_id
FROM users
WHERE user_ids && ARRAY['1', '2', '3']::varchar[]
) sub
WHERE user_id = ANY(ARRAY['1', '2', '3']::varchar[])
GROUP BY user_id;
Is there any way to make use of GIN index here?
答案1
得分: 1
以下是翻译好的部分:
对于小型输入数组和行之间的有限重叠,这个查询应该尽可能快速:
SELECT i.user_id, u.*
FROM unnest('{1, 2, 3}'::varchar[]) i(user_id) -- your input here
CROSS JOIN LATERAL (
SELECT count(*) AS count
FROM users u
WHERE u.user_ids @> ARRAY[i.user_id] -- !
) sub;
显著的差异:我的查询包括具有0匹配的输入元素,您的原始查询没有。
array操作符@>
期望左右操作数都是数组类型(即使在这种情况下右侧只有一个元素)。
数组操作符可以在(user_ids)
上使用GIN索引。您原始查询中的子查询也符合索引的使用条件!
请参阅:
- https://stackoverflow.com/questions/11231544/check-if-value-exists-in-postgres-array/11231965#11231965
- https://stackoverflow.com/questions/4058731/can-postgresql-index-array-columns/29245753#29245753
为什么我的查询计划没有使用GIN索引?
也许Postgres决定选择不同的查询计划,因为过滤条件不够具有选择性,或者列统计信息已过时?
为了确定索引是否适用,使用以下本地设置运行测试(仅在您的会话中!):
SET enable_seqscan = off;
EXPLAIN ....
这会在可能的情况下强制执行无顺序扫描的计划。
英文:
For small input arrays and limited overlap between rows, this query should be as fast as it gets:
SELECT i.user_id, u.*
FROM unnest('{1, 2, 3}'::varchar[]) i(user_id) -- your input here
CROSS JOIN LATERAL (
SELECT count(*) AS count
FROM users u
WHERE u.user_ids @> ARRAY[i.user_id] -- !
) sub;
Notable difference: my query includes elements of the input with 0 matches, your original does not.
The array operator @>
expects array types as left and right operand (even if it's just a single element to the right like in this case).
Array operators can use a GIN index on (user_ids)
. The subquery in your original query also qualifies for index usage!
See:
- https://stackoverflow.com/questions/11231544/check-if-value-exists-in-postgres-array/11231965#11231965
- https://stackoverflow.com/questions/4058731/can-postgresql-index-array-columns/29245753#29245753
Why does my query plan not use the GIN index?
Maybe Postgres decides for a different query plan because the filter is not selective enough, or column statistics are outdated?
To determine whether the index is applicable at all, run a test with this local setting (only in your session!):
SET enable_seqscan = off;
EXPLAIN ....
This forces a plan without sequential scan if at all possible.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论