英文:
Fill in Nulls with Row above
问题
我真的很困扰,要创建'level1 question'和'level2 question'列的一些代码。
填充这些列的逻辑是:
- 当级别=1且类型=组时,'L1'列为question
- 当级别=2且类型=组时,'L2'列为question
但我找不到创建'level1 question'和'level2 question'的有效方法,就像示例中那样。
我尝试使用以下代码,但速度非常慢。在500万行上花了大约2个小时。
'level1 question' =
(select top 1 L1
from table2 t2
where L1 is not null
and t1.Sort >= t2.Sort and t1.ID = t2.ID
order by id , sort DESC)
, 'level2 question' =
(select top 1 L2
from table2 t2
where L2 is not null AND (Level >2 OR Type = 'Group')
and t1.Sort >= t2.Sort and t1.ID = t2.ID
order by id , sort DESC)
英文:
I am really struggling with producing some code create 'level1 question' & 'level2 question' columns.
The logic to populate the columns is:
- 'L1' when level = 1 & type = group then question
- 'L2' when level = 2 & type = group then question
But I can not find an efficient way of creating 'level1 question' & 'level2 question' like in the example.
id | sort | type | question | level | answer | L1 | L2 | level1 question | level2 question |
---|---|---|---|---|---|---|---|---|---|
1 | 1 | Group | Location | 1 | NULL | Location | NULL | Location | NULL |
1 | 2 | Question | UK | 2 | Yes | NULL | NULL | Location | NULL |
1 | 3 | Question | EU | 2 | No | NULL | NULL | Location | NULL |
1 | 4 | Group | Asia | 2 | Yes | NULL | ASIA | Location | ASIA |
1 | 5 | Question | AUS | 3 | Yes | NULL | NULL | Location | ASIA |
1 | 6 | Question | NZ | 3 | No | NULL | NULL | Location | ASIA |
1 | 7 | Group | Season | 1 | NULL | Season | NULL | Season | NULL |
1 | 8 | Question | Summer | 2 | Yes | NULL | NULL | Season | NULL |
2 | 1 | Group | City | 1 | NULL | City | NULL | City | NULL |
2 | 2 | Question | London | 2 | Yes | NULL | NULL | City | NULL |
2 | 3 | Group | Street | 2 | NULL | NULL | Street | City | Street |
2 | 4 | Question | King ST | 3 | No | NULL | NULL | City | Street |
I have tried to use the following code but it is so slow. Just taken ~2 hours over 5 million rows.
'level1 question' =
(select top 1 L1
from table2 t2
where L1 is not null
and t1.Sort >= t2.Sort and t1.ID = t2.ID
order by id , sort DESC)
, 'level2 question' =
(select top 1 L2
from table2 t2
where L2 is not null AND (Level >2 OR Type = 'Group')
and t1.Sort >= t2.Sort and t1.ID = t2.ID
order by id , sort DESC)
答案1
得分: 2
你的代码语法表明你正在运行SQL Server。
在2022版本中,我们可以使用last_value()
和ignore nulls
来高效地检索条件计算的最新非空值:
select t.*,
last_value(case when level = 1 and type = 'Group' then question end)
ignore nulls
over(partition by id order by sort) as level1_question,
last_value(case when level = 2 and type = 'Group' then question end)
ignore nulls
over(partition by id order by sort) as level2_question
from mytable t
在早期版本中(或在不支持ignore nulls
的其他数据库中),我们仍然可以使用窗口函数(应该比在大型数据集上使用多个子查询更有效),但这需要更多工作。思路是定义包含一个非空值后面跟随0到N个空值的行组:
select t.*,
max(question1) over(partition by id, grp1 order by sort) as level1_question,
max(question2) over(partition by id, grp2 order by sort) as level2_question
from (
select t.*,
count(question1) over(partition by id order by sort) grp1,
count(question2) over(partition by id order by sort) grp2
from mytable t
cross apply (
values (
case when level = 1 and type = 'Group' then question end,
case when level = 2 and type = 'Group' then question end
)
) as v(question1, question2)
) t
英文:
The syntax of your code suggests that you are running SQL Server.
In the 2022 version, we can use last_value()
with ignore nulls
to efficiently retrieve the latest non-null of the conditional computation:
select t.*,
last_value(case when level = 1 and type = 'Group' then question end)
ignore nulls
over(partition by id order by sort) as level1_question,
last_value(case when level = 2 and type = 'Group' then question end)
ignore nulls
over(partition by id order by sort) as level2_question
from mytable t
In earlier versions (or in other databases that do not support ignore nulls
), we can still use window functions (which should be more efficient that multiple subqueries on a large dataset), but that requires more work. The idea is to define groups of rows that contain one non-null value followed by 0 to N null values:
select t.*,
max(question1) over(partition by id, grp1 order by sort) as level1_question,
max(question2) over(partition by id, grp2 order by sort) as level2_question
from (
select t.*,
count(question1) over(partition by id order by sort) grp1,
count(question2) over(partition by id order by sort) grp2
from mytable t
cross apply (
values (
case when level = 1 and type = 'Group' then question end,
case when level = 2 and type = 'Group' then question end
)
) as v(question1, question2)
) t
答案2
得分: 0
我认为你经历了内存溢出,这是由于自连接引起的。创建一个CTE或两个,去掉带有null的行,然后将这些CTE连接到表中。我打赌,如果你查看执行计划,你会发现嵌套循环和排序正在影响你的查询。我的解决方案应该能减小这些影响。
英文:
I think you're experiencing an overflow in memory due to the self join. Create a CTE or 2 that removes the rows with null and then join the CTE(s) it to the table instead. I bet if you looked up your execution plan, you have nested loops and sorts that are killing hour query. My solution should reduce the impact of those.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论