使用CASE语句根据同一列中的非空单元格来更新空单元格如何操作?

huangapple go评论63阅读模式
英文:

How do I update null cells based on the non-null cells from the same column using case statements?

问题

我是SQL的新手,我尝试在线搜索类似的帖子来解决我的问题,但没有找到。所以我试试看。

我有一个表格t1,它汇总了调查响应,包括以下列:dateidgrouplevel。还有更多的列,但基本上我关心的是这些。前3列告诉我谁何时回答了调查(idgroup列值之间是多对多的关系)。每个月最多有一个关于每个id-group组合的条目(日期值固定在每月的第一天)。level列是他们对调查的响应。

对于每对现有的idgroup,我想从2022年和2023年获取每个月的level。问题是,该列中有很多空白。看看这个小例子:

日期 | id | group | level
2022-01-01 | 1 | A | null
2022-02-01 | 1 | A | 3
2022-03-01 | 1 | A | null
2022-04-01 | 1 | A | null
2022-05-01 | 1 | A | 4
2022-06-01 | 1 | A | 5
2022-07-01 | 1 | A | null

所以,我想推断我已经有的数据。对于每个具有空level的行,我想分配前一个“缺失”月份中最近的非空月份的level。如果具有空值的行没有前一个月具有非空值(即它是2022年1月),则获取具有非空level的最早月份的值,该月份在空行之后发生。所以期望的输出将是:

日期 | id | group | level
2022-01-01 | 1 | A | 3
2022-02-01 | 1 | A | 3
2022-03-01 | 1 | A | 3
2022-04-01 | 1 | A | 3
2022-05-01 | 1 | A | 4
2022-06-01 | 1 | A | 5
2022-07-01 | 1 | A | 5

我的方法是这样的:
我使用一个表t2来获取所有具有非空level值的t1行。然后我创建了CTE t3,基本上是 t1的副本,多了一个名为insert的额外列。这个新列将是我放置外推的t1.level值的地方,我将使用一个简单的更新函数将t3.insert传递给相应的空t1.level行。

with t3 as 
(Select distinct t1.date, t1.id, t1.group, t1.level,
    Case
    -- 当空的t1记录早于最早的t2记录时
        When (t2.date = (select min(t2.date) where (t1.id, t1.group) = (t2.id, t2.group)))
        Then t2.level
    -- 当空的t1记录晚于某些现有的t2记录时
        When (t2.date = (select max(t2.date) where (t1.id, t1.group) = (t2.id, t2.group) AND t2.date <= t1.date))
        Then t2.level
        Else null
    End as "insert"
    from t1
    Join t2 on (t1.id, t1.group) = (t2.id, t2.group)
    GROUP BY t1.date, t1.id, t1.group, t1.level, t2.id, t2.group, t2.date, t2.level
    ORDER BY t1.id, t1.group, t1.date
)

我甚至无法进行更新t1.level的部分,因为我遇到了问题。我发现对于每一对idgroupt3都将每个t2.level值分配给每个t3.insert。对于id = 1和group = A的2022年1月记录:

日期 | id | group | level | insert
2022-01-01 | 1 | A | null | 3
2022-01-01 | 1 | A | null | 4
2022-01-01 | 1 | A | null | 5

看起来像是PostgreSQL忽略了我的CASE语句中的min()函数。我可能犯了一个非常初级的错误。

英文:

I'm new to SQL, and I tried searching online for a similar post to my problem but came up empty. So here goes.

I have a table t1 that collates survey responses with the following columns: date, id, group, and level. There are more columns but these are basically the ones I'm concerned with. The first 3 columns tell me who answered the survey and when (there's a many to many relationship between the id and group column values). There is at most 1 entry for each id-group combination in each month (with the date value pegged on the first day of the month). The level column is their response to the survey.

For each existing pair of id and group, I want to get the level for each month from 2022 and 2023. The problem is, I have lots of gaps in that column. Take this small example:

date | id | group | level
2022-01-01 | 1 | A | null
2022-02-01 | 1 | A | 3
2022-03-01 | 1 | A | null
2022-04-01 | 1 | A | null
2022-05-01 | 1 | A | 4
2022-06-01 | 1 | A | 5
2022-07-01 | 1 | A | null

So, I want to extrapolate the data that I do have. For each row with a null level, I want to assign the level of the most recent non-null month prior to that "missing" month. If the row with the null value has no prior month with a non-null value (i.e. it's January 2022), get the value of the earliest month with a non-null level that occurs after the null row. So the desired output would be:

date | id | group | level
2022-01-01 | 1 | A | 3
2022-02-01 | 1 | A | 3
2022-03-01 | 1 | A | 3
2022-04-01 | 1 | A | 3
2022-05-01 | 1 | A | 4
2022-06-01 | 1 | A | 5
2022-07-01 | 1 | A | 5

My approach was like this:
I used a table t2 to grab all the rows from t1 with non-null level values. I then created CTE t3 to basically be a duplicate of t1, with one extra column called insert. This new column will be where I put the extrapolated t1.level values and I'll use a simple update function to transfer t3.insert to the corresponding null t1.level rows.

with t3 as 
	Select distinct t1.date, t1.id, t1.group, t1.level,
	Case
--for when the null t1 record is earlier than the earliest t2 record
		When (t2.date = (select min(t2.date) where (t1.id, t1.group) = (t2.id, t2.group)))
		Then t2.level
--when the null t1 record is later than some existing t2 record
		When (t2.date = (select max(t2.date) where (t1.id, t1.group) = (t2.id, t2.group) AND t2.date &lt;= t1.date))
		Then t2.level
		Else null
	End as &quot;insert&quot;
	from t1
	Join t2 on (t1.id, t1.group) = (t2.id, t2.group)
	GROUP BY t1.date, t1.id, t1.group, t1.level, t2.id, t2.group, t2.date, t2.level
	ORDER BY t1.id, t1.group, t1.date
)

I couldn't even get to the part where I update t1.level because I ran into a problem. I found that for every pair of id and group, t3 was assigning every t2.level value to each t3.insert. For the January 2022 record for id = 1 and group = A:

date | id | group | level | insert
2022-01-01 | 1 | A | null | 3
2022-01-01 | 1 | A | null | 4
2022-01-01 | 1 | A | null | 5

It seems like postgresql is ignoring the min() function in my case statement. I probably made a really rookie mistake.

答案1

得分: 1

以下查询演示了一种确定替代空调查级别的方法,该方法通过查找之前的非空级别或如果不存在,则查找下一个非空级别来确定空值调查级别的替代值。

WITH t1(survey_date, id, survey_group, survey_level) AS (
  VALUES ('2022-01-01'::date, 1, 'A', null),
         ('2022-02-01'::date, 1, 'A', 3),
         ('2022-03-01'::date, 1, 'A', null),
         ('2022-04-01'::date, 1, 'A', null),
         ('2022-05-01'::date, 1, 'A', 4),
         ('2022-06-01'::date, 1, 'A', 5),
         ('2022-07-01'::date, 1, 'A', null)
)
SELECT t1.survey_date, t1.id, t1.survey_group, t1.survey_level,
       COALESCE(t1.survey_level,
                -- prior non-NULL level
                (array_agg(t1.survey_level)
                   FILTER (WHERE t1.survey_level IS NOT NULL)
                   OVER (PARTITION BY t1.id, t1.survey_group
                         ORDER BY t1.survey_date DESC
                         ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING))[1],
                -- next non-NULL level
                (array_agg(t1.survey_level)
                   FILTER (WHERE t1.survey_level IS NOT NULL)
                   OVER (PARTITION BY t1.id, t1.survey_group
                         ORDER BY t1.survey_date
                         ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING))[1]) AS substitute_level
  FROM t1
  ORDER BY t1.survey_date, t1.id, t1.survey_group;

前一个和后一个值被收集到数组中,以便数组的第一个元素是给定方向中最近的非空值。由于前一个值按降序排序,窗口帧是使用 FOLLOWING 而不是 PRECEDING 来定义的。

英文:

The following query demonstrates an approach for determining the substitution values for NULL survey levels by finding the prior non-NULL level, or if none exists, then the next non-NULL level.

WITH t1(survey_date, id, survey_group, survey_level) AS (
  VALUES (&#39;2022-01-01&#39;::date, 1, &#39;A&#39;, null),
         (&#39;2022-02-01&#39;::date, 1, &#39;A&#39;, 3),
         (&#39;2022-03-01&#39;::date, 1, &#39;A&#39;, null),
         (&#39;2022-04-01&#39;::date, 1, &#39;A&#39;, null),
         (&#39;2022-05-01&#39;::date, 1, &#39;A&#39;, 4),
         (&#39;2022-06-01&#39;::date, 1, &#39;A&#39;, 5),
         (&#39;2022-07-01&#39;::date, 1, &#39;A&#39;, null)
)
SELECT t1.survey_date, t1.id, t1.survey_group, t1.survey_level,
       COALESCE(t1.survey_level,
                -- prior non-NULL level
                (array_agg(t1.survey_level)
                   FILTER (WHERE t1.survey_level IS NOT NULL)
                   OVER (PARTITION BY t1.id, t1.survey_group
                         ORDER BY t1.survey_date DESC
                         ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING))[1],
                -- next non-NULL level
                (array_agg(t1.survey_level)
                   FILTER (WHERE t1.survey_level IS NOT NULL)
                   OVER (PARTITION BY t1.id, t1.survey_group
                         ORDER BY t1.survey_date
                         ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING))[1]) AS substitute_level
  FROM t1
  ORDER BY t1.survey_date, t1.id, t1.survey_group;

The prior and next values are collected into arrays ordered such that the first element of the array is the nearest non-NULL value in the given direction. Because the prior values are sorted in descending order, the window frame is defined using FOLLOWING instead of PRECEDING.

huangapple
  • 本文由 发表于 2023年6月19日 16:17:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76504821.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定