2023年5月13日 10:14:22go评论101阅读模式

英文:

Postgres: "cumulative" view of table

问题

以下是您要翻译的内容：

I have a table of changes like this

CREATE TABLE IF NOT EXISTS changes (
    entity_id TEXT NOT NULL,
    column_id TEXT NOT NULL,
    value JSONB NOT NULL,
    updated_at TIMESTAMP NOT NULL
);

INSERT INTO changes VALUES ('1', 'height', to_jsonb(140), '01-01-2021 00:00:00'::TIMESTAMP);
INSERT INTO changes VALUES ('1', 'weight', to_jsonb(30), '01-01-2021 00:00:00'::TIMESTAMP);
INSERT INTO changes VALUES ('1', 'height', to_jsonb(145), '01-02-2021 00:00:00'::TIMESTAMP);
INSERT INTO changes VALUES ('1', 'weight', to_jsonb(34),'01-03-2021 00:00:00'::TIMESTAMP);
 entity_id | column_id | value |     updated_at      
-----------+-----------+-------+---------------------
 1         | height    | 140   | 2021-01-01 00:00:00
 1         | weight    | 30    | 2021-01-01 00:00:00
 1         | height    | 145   | 2021-01-02 00:00:00
 1         | weight    | 34    | 2021-01-03 00:00:00

And I want to get kinda cumulative view of this table

entity_id  | height | weight |     updated_at      
-----------+--------+--------+---------------------
 1         | 140    | 30     | 2021-01-01 00:00:00
 1         | 145    | 30     | 2021-01-02 00:00:00
 1         | 145    | 34     | 2021-01-03 00:00:00

My current query looks working

SELECT
  entity_id,
  coalesce(change->'height', lag(change->'height', 1, null) over (partition by entity_id order by updated_at)) as height,
  coalesce(change->'weight', lag(change->'weight', 1, null) over (partition by entity_id order by updated_at)) as weight,
  updated_at
FROM (
    SELECT entity_id, json_object_agg(column_id, value) as change, updated_at FROM changes
    GROUP BY entity_id, updated_at
) as changes;

But I don't like json_object_agg here and I'm sure there is a way do it without redundant aggregations? some kind of using window aggregate function that I've missed.

UPD. @SelVazi helped to make query better, but I feel it's not the final solution.

with cte as (
  SELECT
    entity_id,
    max(case when column_id = 'height' then value::int end) as height,
    max(case when column_id = 'weight' then value::int end) as weight,
    updated_at
  from changes
  GROUP by entity_id, updated_at
)
select
  entity_id,
  coalesce(height, lag(height) over (partition by entity_id order by updated_at)) as height,
  coalesce(weight, lag(weight) over (partition by entity_id order by updated_at)) as weight,
  updated_at
from cte;

英文:

I have a table of changes like this

CREATE TABLE IF NOT EXISTS changes (
    entity_id TEXT NOT NULL,
    column_id TEXT NOT NULL,
    value JSONB NOT NULL,
    updated_at TIMESTAMP NOT NULL
);

INSERT INTO changes VALUES (&#39;1&#39;, &#39;height&#39;, to_jsonb(140), &#39;01-01-2021 00:00:00&#39;::TIMESTAMP);
INSERT INTO changes VALUES (&#39;1&#39;, &#39;weight&#39;, to_jsonb(30), &#39;01-01-2021 00:00:00&#39;::TIMESTAMP);
INSERT INTO changes VALUES (&#39;1&#39;, &#39;height&#39;, to_jsonb(145), &#39;01-02-2021 00:00:00&#39;::TIMESTAMP);
INSERT INTO changes VALUES (&#39;1&#39;, &#39;weight&#39;, to_jsonb(34),&#39;01-03-2021 00:00:00&#39;::TIMESTAMP);
 entity_id | column_id | value |     updated_at      
-----------+-----------+-------+---------------------
 1         | height    | 140   | 2021-01-01 00:00:00
 1         | weight    | 30    | 2021-01-01 00:00:00
 1         | height    | 145   | 2021-01-02 00:00:00
 1         | weight    | 34    | 2021-01-03 00:00:00

And I want to get kinda cumulative view of this table

entity_id  | height | weight |     updated_at      
-----------+--------+--------+---------------------
 1         | 140    | 30     | 2021-01-01 00:00:00
 1         | 145    | 30     | 2021-01-02 00:00:00
 1         | 145    | 34     | 2021-01-03 00:00:00

My current query looks working

SELECT
  entity_id,
  coalesce(change-&gt;&#39;height&#39;, lag(change-&gt;&#39;height&#39;, 1, null) over (partition by entity_id order by updated_at)) as height,
  coalesce(change-&gt;&#39;weight&#39;, lag(change-&gt;&#39;weight&#39;, 1, null) over (partition by entity_id order by updated_at)) as weight,
  updated_at
FROM (
    SELECT entity_id, json_object_agg(column_id, value) as change, updated_at FROM changes
    GROUP BY entity_id, updated_at
) as changes;

But I don't like json_object_agg here and I'm sure there is a way do it without redundant aggregations? some kind of using window aggregate function that I've missed.

UPD. @SelVazi helped to make query better, but I feel it's not the final solution.

with cte as (
  SELECT
    entity_id,
    max(case when column_id = &#39;height&#39; then value::int end) as height,
    max(case when column_id = &#39;weight&#39; then value::int end) as weight,
    updated_at
  from changes
  GROUP by entity_id, updated_at
)
select
  entity_id,
  coalesce(height, lag(height) over (partition by entity_id order by updated_at)) as height,
  coalesce(weight, lag(weight) over (partition by entity_id order by updated_at)) as weight,
  updated_at
from cte;

答案1

得分: 2

这是更加复杂的问题，似乎不太容易。将身高和体重转换为列可以通过条件聚合来实现，但然后我们必须填充“缺失”的值。

我假设任何两种度量中都可能存在超过一个日期的间隙，这使得在Postgres中使用lag()不合适，因为它只能在预定义数量的行上向后查看（并且无法忽略null值）。

我们可以通过在示例数据的末尾添加一行来演示使用lag()的问题：

entity_id	column_id	value	updated_at
1	height	140	2021-01-01 00:00:00
1	weight	30	2021-01-01 00:00:00
1	height	145	2021-02-01 00:00:00
1	weight	34	2021-03-01 00:00:00
1	weight	140	2021-04-01 00:00:00

一个解决方法是使用间隙和岛屿技术将“缺失”的值放在以非null值开头的组中，然后将其视为新值。

select entity_id, updated_at,
    max(height) over(partition by entity_id, grp_height) height,
    max(weight) over(partition by entity_id, grp_weight) weight
from (
    select c.*,
        count(height) over (partition by entity_id order by updated_at) grp_height,
        count(weight) over (partition by entity_id order by updated_at) grp_weight
    from (
        select entity_id, updated_at,
            max(value::text) filter(where column_id = 'height') height,
            max(value::text) filter(where column_id = 'weight') weight
        from changes
        group by entity_id, updated_at
    ) c
) c
order by entity_id, updated_at

fiddle

entity_id	updated_at	height	weight
1	2021-01-01 00:00:00	140	30
1	2021-02-01 00:00:00	145	30
1	2021-03-01 00:00:00	145	34
1	2021-04-01 00:00:00	145	140

英文:

This is more complicated that it may seem. Pivoting the height and weight to columns can be done with conditional aggregation, but then we have to fill the "missing" values.

I would assume that there could be gaps of more than one date for any of the two measures, which makes lag() the wrong fit in Postgres, since it can only look back over a predefined number of rows (and cannot ignore null values).

We can demonstrate the problem with lag() by adding just one row at the end of your sample data:

entity_id	column_id	value	updated_at
1	height	140	2021-01-01 00:00:00
1	weight	30	2021-01-01 00:00:00
1	height	145	2021-02-01 00:00:00
1	weight	34	2021-03-01 00:00:00
1	weight	140	2021-04-01 00:00:00

One workaround uses a gaps-and-islands technique to put "missing" values in groups that start with a non-null value, which then becomes the new value.

select entity_id, updated_at,
    max(height) over(partition by entity_id, grp_height) height,
    max(weight) over(partition by entity_id, grp_weight) weight
from (
    select c.*,
        count(height) over (partition by entity_id order by updated_at) grp_height,
        count(weight) over (partition by entity_id order by updated_at) grp_weight
    from (
        select entity_id, updated_at,
            max(value::text) filter(where column_id = &#39;height&#39;) height,
            max(value::text) filter(where column_id = &#39;weight&#39;) weight
        from changes
        group by entity_id, updated_at
    ) c
) c
order by entity_id, updated_at

fiddle

entity_id	updated_at	height	weight
1	2021-01-01 00:00:00	140	30
1	2021-02-01 00:00:00	145	30
1	2021-03-01 00:00:00	145	34
1	2021-04-01 00:00:00	145	140

答案2

得分: 1

以下是您要翻译的内容：

"Instead of using json_object_agg you can use the conditional aggregation to get hight and width as columns :

SELECT entity_id, MAX(CASE WHEN column_id = 'height' THEN value::text END) AS height
, MAX(CASE WHEN column_id = 'weight' THEN value::text END) AS weight
, updated_at
FROM changes
GROUP BY entity_id, updated_at

The final query :

SELECT
entity_id,
coalesce(height, lag(height, 1, null) over (partition by entity_id order by updated_at)) as height,
coalesce(weight, lag(weight, 1, null) over (partition by entity_id order by updated_at)) as weight,
updated_at
FROM (
SELECT entity_id, MAX(CASE WHEN column_id = 'height' THEN value::text END) AS height
, MAX(CASE WHEN column_id = 'weight' THEN value::text END) AS weight
, updated_at
FROM changes
GROUP BY entity_id, updated_at
) as changes;

Result :

entity_id height weight updated_at
1 140 30 2021-01-01 00:00:00
1 145 30 2021-02-01 00:00:00
1 145 34 2021-03-01 00:00:00"

英文:

Instead of using json_object_agg you can use the conditional aggregation to get hight and width as columns :

SELECT entity_id, MAX(CASE WHEN column_id = &#39;height&#39; THEN value::text END) AS height
                , MAX(CASE WHEN column_id = &#39;weight&#39; THEN value::text END) AS weight
                , updated_at
FROM changes
GROUP BY entity_id, updated_at

The final query :

SELECT
  entity_id,
  coalesce(height, lag(height, 1, null) over (partition by entity_id order by updated_at)) as height,
  coalesce(weight, lag(weight, 1, null) over (partition by entity_id order by updated_at)) as weight,
  updated_at
FROM (
    SELECT entity_id, MAX(CASE WHEN column_id = &#39;height&#39; THEN value::text END) AS height
                , MAX(CASE WHEN column_id = &#39;weight&#39; THEN value::text END) AS weight
                , updated_at
    FROM changes
    GROUP BY entity_id, updated_at
) as changes;

Result :

entity_id	height	weight	updated_at
1	        140	    30	    2021-01-01 00:00:00
1	        145	    30	    2021-02-01 00:00:00
1	        145	    34	    2021-03-01 00:00:00

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Postgres: “累积” 表的视图

问题

答案1

答案2

Golang服务器：发送具有可变列数的SQL查询结果的JSON

在PostgreSQL上，我如何修改一个表，使其从多个父表继承？

尝试强制 Liquibase 在 H2 内存数据库中使用 Postgres 模式

GORM不会创建多对多关联。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。