2023年5月11日 19:11:05go评论97阅读模式

英文:

For each array element, count rows whose array column contains it

问题

在表格 users 中，有一个名为 user_ids 的列，其数据类型为 varchar[]，并且有一个 GIN 索引。每个行中 user_ids 列中的 user_id 都是唯一的。

我想要计算包含输入列表中每个 user_id 的行数。

我尝试使用以下查询，但是 EXPLAIN 显示它没有使用 GIN 索引：

SELECT user_id, count(*) + 1 COUNT
FROM (
	SELECT unnest(user_ids) user_id
	FROM users 
	WHERE user_ids && ARRAY['1', '2', '3']::varchar[]
) sub
WHERE user_id = ANY(ARRAY['1', '2', '3']::varchar[])
GROUP BY user_id;

有没有办法在这里利用 GIN 索引？

英文:

In table users, there is a column user_ids varchar[] with GIN index. Each user_id of column user_ids in each row is unique.

I want to count the number of rows that contain user_id for each user_id in an input list.

I try using this query but EXPLAIN shows it doesn't use GIN index:

SELECT user_id, count(*) + 1 COUNT
FROM (
	SELECT unnest(user_ids) user_id
	FROM users 
	WHERE user_ids &amp;&amp; ARRAY[&#39;1&#39;, &#39;2&#39;, &#39;3&#39;]::varchar[]
) sub
WHERE user_id = ANY(ARRAY[&#39;1&#39;, &#39;2&#39;, &#39;3&#39;]::varchar[])
GROUP BY user_id;

Is there any way to make use of GIN index here?

答案1

得分: 1

以下是翻译好的部分：

对于小型输入数组和行之间的有限重叠，这个查询应该尽可能快速：

SELECT i.user_id, u.*
FROM   unnest('{1, 2, 3}'::varchar[]) i(user_id)  -- your input here
CROSS  JOIN LATERAL (
   SELECT count(*) AS count
   FROM   users u
   WHERE  u.user_ids @> ARRAY[i.user_id]  -- !
   ) sub;

显著的差异：我的查询包括具有0匹配的输入元素，您的原始查询没有。

array操作符@>期望左右操作数都是数组类型（即使在这种情况下右侧只有一个元素）。

数组操作符可以在(user_ids)上使用GIN索引。您原始查询中的子查询也符合索引的使用条件！

请参阅：

为什么我的查询计划没有使用GIN索引？

也许Postgres决定选择不同的查询计划，因为过滤条件不够具有选择性，或者列统计信息已过时？

为了确定索引是否适用，使用以下本地设置运行测试（仅在您的会话中！）：

SET enable_seqscan = off;
EXPLAIN ....

这会在可能的情况下强制执行无顺序扫描的计划。

英文:

For small input arrays and limited overlap between rows, this query should be as fast as it gets:

SELECT i.user_id, u.*
FROM   unnest(&#39;{1, 2, 3}&#39;::varchar[]) i(user_id)  -- your input here
CROSS  JOIN LATERAL (
   SELECT count(*) AS count
   FROM   users u
   WHERE  u.user_ids @&gt; ARRAY[i.user_id]  -- !
   ) sub;

Notable difference: my query includes elements of the input with 0 matches, your original does not.

The array operator @> expects array types as left and right operand (even if it's just a single element to the right like in this case).

Array operators can use a GIN index on (user_ids). The subquery in your original query also qualifies for index usage!

See:

Why does my query plan not use the GIN index?

Maybe Postgres decides for a different query plan because the filter is not selective enough, or column statistics are outdated?

To determine whether the index is applicable at all, run a test with this local setting (only in your session!):

SET enable_seqscan = off;
EXPLAIN ....

This forces a plan without sequential scan if at all possible.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

对于每个数组元素，计算包含它的数组列的行数。

问题

答案1

为什么我的查询计划没有使用GIN索引？

Why does my query plan not use the GIN index?

填补空白并分配最近可用的值到MySQL。

从不同的表中选择包含单词列表的记录。

扩展日期并插入到另一个表格中。

Python API makes a db call SQL Query Efficiency between = vs IN operator for multiple values

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。