英文:
How to fill NULL values in a PostgreSQL column using the last non-NULL value
问题
我有一个具有以下结构的PostgreSQL表:
CREATE TABLE cte1 (
entity_id INT,
assignedtogroup INT,
time BIGINT
);
INSERT INTO cte1 (entity_id, assignedtogroup, time)
VALUES
(1, 435198, 1687863949740),
(1, 435198, 1687863949741),
(1, NULL, 1687863949742),
(1, NULL, 1687863949743),
(1, 435224, 1687863949744),
(1, 435224, 1687863949745),
(1, 435143, 1687863949746),
(1, 435143, 1687863949747),
(1, 435191, 1687863949748),
(1, NULL, 1687863949749),
(2, 435143, 1690452125291),
(2, 435143, 1690452125292),
(2, 435191, 1690452125293),
(2, NULL, 1690452125294);
我想使用前一行(当前行之前的时间和相同的entity_id)的非空值来填充assignedtogroup列中的空值。预期结果应为:
entity_id | assignedtogroup | time |
---|---|---|
1 | 435198 | 1687863949740 |
1 | 435198 | 1687863949741 |
1 | 435198 | 1687863949742 |
1 | 435198 | 1687863949743 |
1 | 435224 | 1687863949744 |
1 | 435224 | 1687863949745 |
1 | 435143 | 1687863949746 |
1 | 435143 | 1687863949747 |
1 | 435191 | 1687863949748 |
1 | 435191 | 1687863949749 |
2 | 435143 | 1690452125291 |
2 | 435143 | 1690452125292 |
2 | 435191 | 1690452125293 |
2 | 435191 | 1690452125294 |
是否有办法仅使用SELECT语句实现这一点?
我尝试使用LAG函数:
SELECT
entity_id,
COALESCE(
assignedtogroup,
LAG(assignedtogroup) OVER (PARTITION BY entity_id ORDER BY time)
) AS filled_assignedtogroup
FROM cte1;
然而,我仍然有一个NULL值,并且对于entity_id为2的情况,值完全混合。
你可以在DB Fiddle中找到示例:https://www.db-fiddle.com/f/m52Rgq8jtK85g9yvaDMJqz/3
英文:
I have a PostgreSQL table with the following structure:
CREATE TABLE cte1 (
entity_id INT,
assignedtogroup INT,
time BIGINT
);
INSERT INTO cte1 (entity_id, assignedtogroup, time)
VALUES
(1, 435198, 1687863949740),
(1, 435198, 1687863949741),
(1, NULL, 1687863949742),
(1, NULL, 1687863949743),
(1, 435224, 1687863949744),
(1, 435224, 1687863949745),
(1, 435143, 1687863949746),
(1, 435143, 1687863949747),
(1, 435191, 1687863949748),
(1, NULL, 1687863949749),
(2, 435143, 1690452125291),
(2, 435143, 1690452125292),
(2, 435191, 1690452125293),
(2, NULL, 1690452125294);
I would like to fill the empty values in the assignedtogroup column using the previous row's (time just before the current row and same entity_id) non-null value. The expected result should be:
entity_id | assignedtogroup | time |
---|---|---|
1 | 435198 | 1687863949740 |
1 | 435198 | 1687863949741 |
1 | 435198 | 1687863949742 |
1 | 435198 | 1687863949743 |
1 | 435224 | 1687863949744 |
1 | 435224 | 1687863949745 |
1 | 435143 | 1687863949746 |
1 | 435143 | 1687863949747 |
1 | 435191 | 1687863949748 |
1 | 435191 | 1687863949749 |
2 | 435143 | 1690452125291 |
2 | 435143 | 1690452125292 |
2 | 435191 | 1690452125293 |
2 | 435191 | 1690452125294 |
Is there a way to achieve this using only a SELECT statement?
I tried using the LAG function:
SELECT
entity_id,
COALESCE(
assignedtogroup,
LAG(assignedtogroup) OVER (PARTITION BY entity_id ORDER BY time)
) AS filled_assignedtogroup
FROM cte1;
However, I still have a NULL value and for the entity_id 2, the values are completely mixed.
You can find the DB Fiddle : https://www.db-fiddle.com/f/m52Rgq8jtK85g9yvaDMJqz/3
答案1
得分: 1
你可以在这里使用简单的相关性来改进:
select Entity_Id, Coalesce(assignedtogroup, (
select assignedtogroup
from cte1 cte2 where cte2.entity_id = cte1.entity_id
and cte2.time < cte1.time
and cte2.assignedtogroup is not null
order by time desc
limit 1
)), time
from cte1;
更新的DB fiddle链接:https://www.db-fiddle.com/f/m52Rgq8jtK85g9yvaDMJqz/3
英文:
You would be better-off imo using a simple correlation here:
select Entity_Id, Coalesce(assignedtogroup, (
select assignedtogroup
from cte1 cte2 where cte2.entity_id = cte1.entity_id
and cte2.time < cte1.time
and cte2.assignedtogroup is not null
order by time desc
limit 1
)), time
from cte1;
Updated DB fiddle https://www.db-fiddle.com/f/m52Rgq8jtK85g9yvaDMJqz/3
答案2
得分: 1
你可以使用MAX
和COUNT
窗口函数。
查询 #1
WITH CTE AS (
SELECT *,
COUNT(CASE WHEN assignedtogroup IS NOT NULL THEN 1 END) OVER(
PARTITION BY entity_id
ORDER BY time
) AS rn
FROM cte1
)
SELECT entity_id, MAX(assignedtogroup) OVER(PARTITION BY entity_id, rn) AS value_ , time
FROM CTE
ORDER BY 1,3;
entity_id | value_ | time |
---|---|---|
1 | 435198 | 1687863949740 |
1 | 435198 | 1687863949741 |
1 | 435198 | 1687863949742 |
1 | 435198 | 1687863949743 |
1 | 435224 | 1687863949744 |
1 | 435224 | 1687863949745 |
1 | 435143 | 1687863949746 |
1 | 435143 | 1687863949747 |
1 | 435191 | 1687863949748 |
1 | 435191 | 1687863949749 |
2 | 435143 | 1690452125291 |
2 | 435143 | 1690452125292 |
2 | 435191 | 1690452125293 |
2 | 435191 | 1690452125294 |
英文:
you can use a MAX
and COUNT
window function
Query #1
WITH CTE AS (
SELECT *,
COUNT(CASE WHEN assignedtogroup IS NOT NULL THEN 1 END) OVER(
PARTITION BY entity_id
ORDER BY time
) AS rn
FROM cte1
)
SELECT entity_id, MAX(assignedtogroup) OVER(PARTITION BY entity_id, rn) AS value_ , time
FROM CTE
ORDER BY 1,3;
entity_id | value_ | time |
---|---|---|
1 | 435198 | 1687863949740 |
1 | 435198 | 1687863949741 |
1 | 435198 | 1687863949742 |
1 | 435198 | 1687863949743 |
1 | 435224 | 1687863949744 |
1 | 435224 | 1687863949745 |
1 | 435143 | 1687863949746 |
1 | 435143 | 1687863949747 |
1 | 435191 | 1687863949748 |
1 | 435191 | 1687863949749 |
2 | 435143 | 1690452125291 |
2 | 435143 | 1690452125292 |
2 | 435191 | 1690452125293 |
2 | 435191 | 1690452125294 |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论