用上一行的值填充空白值

huangapple go评论97阅读模式
英文:

Fill in Nulls with Row above

问题

  1. 我真的很困扰,要创建'level1 question''level2 question'列的一些代码。
  2. 填充这些列的逻辑是:
  3. - 当级别=1且类型=组时,'L1'列为question
  4. - 当级别=2且类型=组时,'L2'列为question
  5. 但我找不到创建'level1 question''level2 question'的有效方法,就像示例中那样。
  6. 我尝试使用以下代码,但速度非常慢。在500万行上花了大约2个小时。
  7. 'level1 question' =
  8. (select top 1 L1
  9. from table2 t2
  10. where L1 is not null
  11. and t1.Sort >= t2.Sort and t1.ID = t2.ID
  12. order by id , sort DESC)
  13. , 'level2 question' =
  14. (select top 1 L2
  15. from table2 t2
  16. where L2 is not null AND (Level >2 OR Type = 'Group')
  17. and t1.Sort >= t2.Sort and t1.ID = t2.ID
  18. order by id , sort DESC)
英文:

I am really struggling with producing some code create 'level1 question' & 'level2 question' columns.

The logic to populate the columns is:

  • 'L1' when level = 1 & type = group then question
  • 'L2' when level = 2 & type = group then question

But I can not find an efficient way of creating 'level1 question' & 'level2 question' like in the example.

id sort type question level answer L1 L2 level1 question level2 question
1 1 Group Location 1 NULL Location NULL Location NULL
1 2 Question UK 2 Yes NULL NULL Location NULL
1 3 Question EU 2 No NULL NULL Location NULL
1 4 Group Asia 2 Yes NULL ASIA Location ASIA
1 5 Question AUS 3 Yes NULL NULL Location ASIA
1 6 Question NZ 3 No NULL NULL Location ASIA
1 7 Group Season 1 NULL Season NULL Season NULL
1 8 Question Summer 2 Yes NULL NULL Season NULL
2 1 Group City 1 NULL City NULL City NULL
2 2 Question London 2 Yes NULL NULL City NULL
2 3 Group Street 2 NULL NULL Street City Street
2 4 Question King ST 3 No NULL NULL City Street

I have tried to use the following code but it is so slow. Just taken ~2 hours over 5 million rows.

  1. 'level1 question' =
  2. (select top 1 L1
  3. from table2 t2
  4. where L1 is not null
  5. and t1.Sort >= t2.Sort and t1.ID = t2.ID
  6. order by id , sort DESC)
  7. , 'level2 question' =
  8. (select top 1 L2
  9. from table2 t2
  10. where L2 is not null AND (Level >2 OR Type = 'Group')
  11. and t1.Sort >= t2.Sort and t1.ID = t2.ID
  12. order by id , sort DESC)

答案1

得分: 2

你的代码语法表明你正在运行SQL Server。

在2022版本中,我们可以使用last_value()ignore nulls来高效地检索条件计算的最新非空值:

  1. select t.*,
  2. last_value(case when level = 1 and type = 'Group' then question end)
  3. ignore nulls
  4. over(partition by id order by sort) as level1_question,
  5. last_value(case when level = 2 and type = 'Group' then question end)
  6. ignore nulls
  7. over(partition by id order by sort) as level2_question
  8. from mytable t

在早期版本中(或在不支持ignore nulls的其他数据库中),我们仍然可以使用窗口函数(应该比在大型数据集上使用多个子查询更有效),但这需要更多工作。思路是定义包含一个非空值后面跟随0到N个空值的行组:

  1. select t.*,
  2. max(question1) over(partition by id, grp1 order by sort) as level1_question,
  3. max(question2) over(partition by id, grp2 order by sort) as level2_question
  4. from (
  5. select t.*,
  6. count(question1) over(partition by id order by sort) grp1,
  7. count(question2) over(partition by id order by sort) grp2
  8. from mytable t
  9. cross apply (
  10. values (
  11. case when level = 1 and type = 'Group' then question end,
  12. case when level = 2 and type = 'Group' then question end
  13. )
  14. ) as v(question1, question2)
  15. ) t
英文:

The syntax of your code suggests that you are running SQL Server.

In the 2022 version, we can use last_value() with ignore nulls to efficiently retrieve the latest non-null of the conditional computation:

  1. select t.*,
  2. last_value(case when level = 1 and type = 'Group' then question end)
  3. ignore nulls
  4. over(partition by id order by sort) as level1_question,
  5. last_value(case when level = 2 and type = 'Group' then question end)
  6. ignore nulls
  7. over(partition by id order by sort) as level2_question
  8. from mytable t

In earlier versions (or in other databases that do not support ignore nulls), we can still use window functions (which should be more efficient that multiple subqueries on a large dataset), but that requires more work. The idea is to define groups of rows that contain one non-null value followed by 0 to N null values:

  1. select t.*,
  2. max(question1) over(partition by id, grp1 order by sort) as level1_question,
  3. max(question2) over(partition by id, grp2 order by sort) as level2_question
  4. from (
  5. select t.*,
  6. count(question1) over(partition by id order by sort) grp1,
  7. count(question2) over(partition by id order by sort) grp2
  8. from mytable t
  9. cross apply (
  10. values (
  11. case when level = 1 and type = 'Group' then question end,
  12. case when level = 2 and type = 'Group' then question end
  13. )
  14. ) as v(question1, question2)
  15. ) t

答案2

得分: 0

我认为你经历了内存溢出,这是由于自连接引起的。创建一个CTE或两个,去掉带有null的行,然后将这些CTE连接到表中。我打赌,如果你查看执行计划,你会发现嵌套循环和排序正在影响你的查询。我的解决方案应该能减小这些影响。

英文:

I think you're experiencing an overflow in memory due to the self join. Create a CTE or 2 that removes the rows with null and then join the CTE(s) it to the table instead. I bet if you looked up your execution plan, you have nested loops and sorts that are killing hour query. My solution should reduce the impact of those.

huangapple
  • 本文由 发表于 2023年6月22日 04:40:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76526978.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定