英文:
How do I update null cells based on the non-null cells from the same column using case statements?
问题
我是SQL的新手,我尝试在线搜索类似的帖子来解决我的问题,但没有找到。所以我试试看。
我有一个表格t1
,它汇总了调查响应,包括以下列:date
、id
、group
和level
。还有更多的列,但基本上我关心的是这些。前3列告诉我谁何时回答了调查(id
和group
列值之间是多对多的关系)。每个月最多有一个关于每个id
-group
组合的条目(日期值固定在每月的第一天)。level
列是他们对调查的响应。
对于每对现有的id
和group
,我想从2022年和2023年获取每个月的level
。问题是,该列中有很多空白。看看这个小例子:
日期 | id | group | level
2022-01-01 | 1 | A | null
2022-02-01 | 1 | A | 3
2022-03-01 | 1 | A | null
2022-04-01 | 1 | A | null
2022-05-01 | 1 | A | 4
2022-06-01 | 1 | A | 5
2022-07-01 | 1 | A | null
所以,我想推断我已经有的数据。对于每个具有空level
的行,我想分配前一个“缺失”月份中最近的非空月份的level
。如果具有空值的行没有前一个月具有非空值(即它是2022年1月),则获取具有非空level
的最早月份的值,该月份在空行之后发生。所以期望的输出将是:
日期 | id | group | level
2022-01-01 | 1 | A | 3
2022-02-01 | 1 | A | 3
2022-03-01 | 1 | A | 3
2022-04-01 | 1 | A | 3
2022-05-01 | 1 | A | 4
2022-06-01 | 1 | A | 5
2022-07-01 | 1 | A | 5
我的方法是这样的:
我使用一个表t2
来获取所有具有非空level
值的t1
行。然后我创建了CTE t3
,基本上是 t1
的副本,多了一个名为insert
的额外列。这个新列将是我放置外推的t1.level
值的地方,我将使用一个简单的更新函数将t3.insert
传递给相应的空t1.level
行。
with t3 as
(Select distinct t1.date, t1.id, t1.group, t1.level,
Case
-- 当空的t1记录早于最早的t2记录时
When (t2.date = (select min(t2.date) where (t1.id, t1.group) = (t2.id, t2.group)))
Then t2.level
-- 当空的t1记录晚于某些现有的t2记录时
When (t2.date = (select max(t2.date) where (t1.id, t1.group) = (t2.id, t2.group) AND t2.date <= t1.date))
Then t2.level
Else null
End as "insert"
from t1
Join t2 on (t1.id, t1.group) = (t2.id, t2.group)
GROUP BY t1.date, t1.id, t1.group, t1.level, t2.id, t2.group, t2.date, t2.level
ORDER BY t1.id, t1.group, t1.date
)
我甚至无法进行更新t1.level
的部分,因为我遇到了问题。我发现对于每一对id
和group
,t3
都将每个t2.level
值分配给每个t3.insert
。对于id
= 1和group
= A的2022年1月记录:
日期 | id | group | level | insert
2022-01-01 | 1 | A | null | 3
2022-01-01 | 1 | A | null | 4
2022-01-01 | 1 | A | null | 5
看起来像是PostgreSQL忽略了我的CASE语句中的min()函数。我可能犯了一个非常初级的错误。
英文:
I'm new to SQL, and I tried searching online for a similar post to my problem but came up empty. So here goes.
I have a table t1
that collates survey responses with the following columns: date
, id
, group
, and level
. There are more columns but these are basically the ones I'm concerned with. The first 3 columns tell me who answered the survey and when (there's a many to many relationship between the id
and group
column values). There is at most 1 entry for each id
-group
combination in each month (with the date value pegged on the first day of the month). The level
column is their response to the survey.
For each existing pair of id
and group
, I want to get the level
for each month from 2022 and 2023. The problem is, I have lots of gaps in that column. Take this small example:
date | id | group | level
2022-01-01 | 1 | A | null
2022-02-01 | 1 | A | 3
2022-03-01 | 1 | A | null
2022-04-01 | 1 | A | null
2022-05-01 | 1 | A | 4
2022-06-01 | 1 | A | 5
2022-07-01 | 1 | A | null
So, I want to extrapolate the data that I do have. For each row with a null level
, I want to assign the level
of the most recent non-null month prior to that "missing" month. If the row with the null value has no prior month with a non-null value (i.e. it's January 2022), get the value of the earliest month with a non-null level
that occurs after the null row. So the desired output would be:
date | id | group | level
2022-01-01 | 1 | A | 3
2022-02-01 | 1 | A | 3
2022-03-01 | 1 | A | 3
2022-04-01 | 1 | A | 3
2022-05-01 | 1 | A | 4
2022-06-01 | 1 | A | 5
2022-07-01 | 1 | A | 5
My approach was like this:
I used a table t2
to grab all the rows from t1
with non-null level
values. I then created CTE t3
to basically be a duplicate of t1
, with one extra column called insert
. This new column will be where I put the extrapolated t1.level
values and I'll use a simple update function to transfer t3.insert
to the corresponding null t1.level
rows.
with t3 as
Select distinct t1.date, t1.id, t1.group, t1.level,
Case
--for when the null t1 record is earlier than the earliest t2 record
When (t2.date = (select min(t2.date) where (t1.id, t1.group) = (t2.id, t2.group)))
Then t2.level
--when the null t1 record is later than some existing t2 record
When (t2.date = (select max(t2.date) where (t1.id, t1.group) = (t2.id, t2.group) AND t2.date <= t1.date))
Then t2.level
Else null
End as "insert"
from t1
Join t2 on (t1.id, t1.group) = (t2.id, t2.group)
GROUP BY t1.date, t1.id, t1.group, t1.level, t2.id, t2.group, t2.date, t2.level
ORDER BY t1.id, t1.group, t1.date
)
I couldn't even get to the part where I update t1.level
because I ran into a problem. I found that for every pair of id
and group
, t3
was assigning every t2.level
value to each t3.insert
. For the January 2022 record for id
= 1 and group
= A:
date | id | group | level | insert
2022-01-01 | 1 | A | null | 3
2022-01-01 | 1 | A | null | 4
2022-01-01 | 1 | A | null | 5
It seems like postgresql is ignoring the min() function in my case statement. I probably made a really rookie mistake.
答案1
得分: 1
以下查询演示了一种确定替代空调查级别的方法,该方法通过查找之前的非空级别或如果不存在,则查找下一个非空级别来确定空值调查级别的替代值。
WITH t1(survey_date, id, survey_group, survey_level) AS (
VALUES ('2022-01-01'::date, 1, 'A', null),
('2022-02-01'::date, 1, 'A', 3),
('2022-03-01'::date, 1, 'A', null),
('2022-04-01'::date, 1, 'A', null),
('2022-05-01'::date, 1, 'A', 4),
('2022-06-01'::date, 1, 'A', 5),
('2022-07-01'::date, 1, 'A', null)
)
SELECT t1.survey_date, t1.id, t1.survey_group, t1.survey_level,
COALESCE(t1.survey_level,
-- prior non-NULL level
(array_agg(t1.survey_level)
FILTER (WHERE t1.survey_level IS NOT NULL)
OVER (PARTITION BY t1.id, t1.survey_group
ORDER BY t1.survey_date DESC
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING))[1],
-- next non-NULL level
(array_agg(t1.survey_level)
FILTER (WHERE t1.survey_level IS NOT NULL)
OVER (PARTITION BY t1.id, t1.survey_group
ORDER BY t1.survey_date
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING))[1]) AS substitute_level
FROM t1
ORDER BY t1.survey_date, t1.id, t1.survey_group;
前一个和后一个值被收集到数组中,以便数组的第一个元素是给定方向中最近的非空值。由于前一个值按降序排序,窗口帧是使用 FOLLOWING
而不是 PRECEDING
来定义的。
英文:
The following query demonstrates an approach for determining the substitution values for NULL survey levels by finding the prior non-NULL level, or if none exists, then the next non-NULL level.
WITH t1(survey_date, id, survey_group, survey_level) AS (
VALUES ('2022-01-01'::date, 1, 'A', null),
('2022-02-01'::date, 1, 'A', 3),
('2022-03-01'::date, 1, 'A', null),
('2022-04-01'::date, 1, 'A', null),
('2022-05-01'::date, 1, 'A', 4),
('2022-06-01'::date, 1, 'A', 5),
('2022-07-01'::date, 1, 'A', null)
)
SELECT t1.survey_date, t1.id, t1.survey_group, t1.survey_level,
COALESCE(t1.survey_level,
-- prior non-NULL level
(array_agg(t1.survey_level)
FILTER (WHERE t1.survey_level IS NOT NULL)
OVER (PARTITION BY t1.id, t1.survey_group
ORDER BY t1.survey_date DESC
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING))[1],
-- next non-NULL level
(array_agg(t1.survey_level)
FILTER (WHERE t1.survey_level IS NOT NULL)
OVER (PARTITION BY t1.id, t1.survey_group
ORDER BY t1.survey_date
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING))[1]) AS substitute_level
FROM t1
ORDER BY t1.survey_date, t1.id, t1.survey_group;
The prior and next values are collected into arrays ordered such that the first element of the array is the nearest non-NULL value in the given direction. Because the prior values are sorted in descending order, the window frame is defined using FOLLOWING
instead of PRECEDING
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论