2023年2月24日 03:47:00go评论140阅读模式

英文:

Calculus on Postgres jsonb array of objects

问题

我有一个包含 jsonb 类型列的Postgres表，看起来像这样：

id   | array_obj
---    ---
id_0 | [ {"Key": "k1", "Value": v1 }, {"Key": "k2", "Value": v2 }, ... ]

我的目标是计算类似于 "与键k1关联的值+与键k3关联的值" 这样的内容。我设法通过展开数组 jsonb_to_recordset 然后使用分组来总结所需的内容，但我觉得这不是很令人满意，因为我认为不应该需要展开/汇总模式（毕竟，我只想 "按行" 计算一些内容）。当前查询如下所示：

SELECT
  id,
  sum(val) filter (where k in ('k1', 'k3')) as result,
  sum(case when k in ('k1', 'k2') then 1 else 0 end) as countkeys
FROM 
  my_table, jsonb_to_recordset(array_obj) as expanded(k text, val int)
GROUP BY
  id

（countkeys 用于处理存在'k1'但不存在'k2'的情况，因此总和不是预期值，因此我检查countkeys是否等于我对键进行求和的数量）。

执行计划如下（带有LIMIT 50）：

Limit  (cost=0.56..256.99 rows=50 width=104) (actual time=0.217..4.965 rows=50 loops=1)
  Buffers: shared hit=166
  ->  GroupAggregate  (cost=0.56..17512338.31 rows=3414645 width=104) (actual time=0.216..4.959 rows=50 loops=1)
        Group Key: my_table.id
        Buffers: shared hit=166
        ->  Nested Loop  (cost=0.56..11485489.89 rows=341464500 width=72) (actual time=0.104..3.396 rows=5301 loops=1)
              Buffers: shared hit=166
              ->  Index Scan using xxx on my_table (cost=0.56..4656199.88 rows=3414645 width=498) (actual time=0.011..0.159 rows=51 loops=1)
                    Filter: (col = 0)
                    Rows Removed by Filter: 64
                    Buffers: shared hit=119
              ->  Function Scan on jsonb_to_recordset expanded  (cost=0.00..1.00 rows=100 width=40) (actual time=0.042..0.047 rows=104 loops=51)
                    Buffers: shared hit=47
Planning:
  Buffers: shared hit=198
Planning Time: 0.552 ms
Execution Time: 5.099 ms

所以基本上我只是尝试构建一种 "运算符"，我可以使用它来 "查找与键k相关联的值"，使用我拥有的数据类型。我已经尝试过文档中列出的运算符，但尚未找到解决方法...

请注意，这只是第一步，我的最终目标是计算多个聚合值（比如 result_1 = value(key_1) + value(key_3)，result_2 = value(key_7) + value(key_124)等），所以我正在寻找一个不独立计算结果的解决方案。

响应 @SelVazi 的评论，这是一个典型的行：

id | [{"Key":"af_m1","Value":9772},{"Key":"af_m2","Value":7413},{"Key":"af_m3","Value":2359}]

假设我想计算 af_m1 + af_m3 = 9772 + 2359 = 12131。请注意，某些行可能没有具有键 af_m1 或 af_m3 的对象，在这种情况下，我们应该跳过该行（但我认为我可以在了解如何计算我想要的内容后再处理这个问题）。

英文:

I have a Postgres table that has a column of type jsonb and that looks like:

id   | array_obj
---    ---
id_0 | [ {&quot;Key&quot;: &quot;k1&quot;, &quot;Value&quot;: v1 }, {&quot;Key&quot;: &quot;k2&quot;, &quot;Value&quot;: v2 }, ... ]

My goal is to compute something like "value associated to key k1 + value associated to key k3" for instance. I managed to do this by expanding the array jsonb_to_recordset and then using a group-by to sum what I needed but this is not very satisfying as I feel like there should be no need for this expansion / aggregation pattern (after all I just want to compute something "row-wise"). Currently the query looks like:

SELECT
  id,
  sum(val) filter (where k in (&#39;k1&#39;, &#39;k3&#39;)) as result,
  sum(case when k in (&#39;k1&#39;, &#39;k2&#39;) then 1 else 0 end) as countkeys
FROM 
  my_table, jsonb_to_recordset(array_obj) as expanded(k text, val int)
GROUP BY
  id

(countkeys is here to address cases when there is 'k1' but no 'k2', hence the sum is not really what's expected, so I then chekc that countkeys equals the number of keys I sum on).

The execution plan is the following (with LIMIT 50):

Limit  (cost=0.56..256.99 rows=50 width=104) (actual time=0.217..4.965 rows=50 loops=1)
  Buffers: shared hit=166
  -&gt;  GroupAggregate  (cost=0.56..17512338.31 rows=3414645 width=104) (actual time=0.216..4.959 rows=50 loops=1)
        Group Key: my_table.id
        Buffers: shared hit=166
        -&gt;  Nested Loop  (cost=0.56..11485489.89 rows=341464500 width=72) (actual time=0.104..3.396 rows=5301 loops=1)
              Buffers: shared hit=166
              -&gt;  Index Scan using xxx on my_table (cost=0.56..4656199.88 rows=3414645 width=498) (actual time=0.011..0.159 rows=51 loops=1)
                    Filter: (col = 0)
                    Rows Removed by Filter: 64
                    Buffers: shared hit=119
              -&gt;  Function Scan on jsonb_to_recordset expanded  (cost=0.00..1.00 rows=100 width=40) (actual time=0.042..0.047 rows=104 loops=51)
                    Buffers: shared hit=47
Planning:
  Buffers: shared hit=198
Planning Time: 0.552 ms
Execution Time: 5.099 ms

So basically I'm just trying to build some kind of "operator" that I could use for "find the value associated to key k" with the kind of data I have. I've tried with the operators listed in the documentation but haven't figured it out yet...

Note that this is a first step, my end goal is to compute several aggregates (say result_1 = value(key_1) + value(key_3), result_2 = value(key_7) + value(key_124), etc.) so I'm looking for a solution that does not compute results independantly.

In response to the comment of @SelVazi, here is a typical line:
> id | [{"Key":"af_m1","Value":9772},{"Key":"af_m2","Value":7413},{"Key":"af_m3","Value":2359}]

And say I'd like to compute af_m1 + af_m3 = 9772 + 2359 = 12131. Note that some line may not have any object with key af_m1 or af_m3 in that case we should skip the line (but that's OK I think I can handle it in a second time when I know how to compute what I want first).

答案1

得分: 1

I can save some time, about 30%, by doing the summation in a subquery. That way it doesn't need to be broken apart, mingled, and then resegregated with a GROUP BY. The original query uses parallel query which my variant cannot do, so the original is even using more CPU while it is underperforming.

explain (analyze, buffers)
SELECT
id,
(select sum("Value") filter (where "Key" in ('k1', 'k3')) as result from
jsonb_to_recordset(array_obj) as expanded("Key" text, "Value" double precision)
) as result
from my_table;

It would be much faster yet though to just store the data in a sensible format, either up-front or in a materialized view or just a transformed copy.

英文:

explain (analyze, buffers)
SELECT
  id, 
  (select sum(&quot;Value&quot;) filter (where &quot;Key&quot; in (&#39;k1&#39;, &#39;k3&#39;)) as result from 
     jsonb_to_recordset(array_obj) as expanded(&quot;Key&quot; text, &quot;Value&quot; double precision)
  ) as result
from my_table;

It would be much faster yet though to just store the data in a sensible format, either up-front or in a materialized view or just a transformed copy.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Postgres jsonb数组对象上的微积分

问题

答案1

How to build a sql query to get 100 rows belonging to weekday(Mon-Fri) from a table using epoch time column

如何在Talend中查询tdbrow后捕获输出并显示？

在用户的时区中显示 timestamptz。

Send messages to users via FCM once per day based on their shifting time-of-day preferences without dupes, using Redis

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论