用上一行的值填充空白值

huangapple go评论71阅读模式
英文:

Fill in Nulls with Row above

问题

我真的很困扰,要创建'level1 question'和'level2 question'列的一些代码。

填充这些列的逻辑是:

- 当级别=1且类型=组时,'L1'列为question
- 当级别=2且类型=组时,'L2'列为question

但我找不到创建'level1 question'和'level2 question'的有效方法,就像示例中那样。

我尝试使用以下代码,但速度非常慢。在500万行上花了大约2个小时。

'level1 question' = 
                (select top 1 L1 
                from table2 t2
                where L1 is not null
                and t1.Sort >= t2.Sort and t1.ID = t2.ID
                order by id , sort DESC)
, 'level2 question' = 
            (select top 1 L2 
            from table2 t2
            where L2 is not null AND (Level >2 OR Type = 'Group')
            and t1.Sort >= t2.Sort and t1.ID = t2.ID
            order by id , sort DESC)
英文:

I am really struggling with producing some code create 'level1 question' & 'level2 question' columns.

The logic to populate the columns is:

  • 'L1' when level = 1 & type = group then question
  • 'L2' when level = 2 & type = group then question

But I can not find an efficient way of creating 'level1 question' & 'level2 question' like in the example.

id sort type question level answer L1 L2 level1 question level2 question
1 1 Group Location 1 NULL Location NULL Location NULL
1 2 Question UK 2 Yes NULL NULL Location NULL
1 3 Question EU 2 No NULL NULL Location NULL
1 4 Group Asia 2 Yes NULL ASIA Location ASIA
1 5 Question AUS 3 Yes NULL NULL Location ASIA
1 6 Question NZ 3 No NULL NULL Location ASIA
1 7 Group Season 1 NULL Season NULL Season NULL
1 8 Question Summer 2 Yes NULL NULL Season NULL
2 1 Group City 1 NULL City NULL City NULL
2 2 Question London 2 Yes NULL NULL City NULL
2 3 Group Street 2 NULL NULL Street City Street
2 4 Question King ST 3 No NULL NULL City Street

I have tried to use the following code but it is so slow. Just taken ~2 hours over 5 million rows.

'level1 question' = 
                (select top 1 L1 
    			from table2 t2
    			where L1 is not null
    			and t1.Sort >= t2.Sort and t1.ID = t2.ID
    			order by id , sort DESC)
, 'level2 question' = 
            (select top 1 L2 
			from table2 t2
			where L2 is not null AND (Level >2 OR Type = 'Group')
			and t1.Sort >= t2.Sort and t1.ID = t2.ID
			order by id , sort DESC)

答案1

得分: 2

你的代码语法表明你正在运行SQL Server。

在2022版本中,我们可以使用last_value()ignore nulls来高效地检索条件计算的最新非空值:

select t.*,
    last_value(case when level = 1 and type = 'Group' then question end) 
        ignore nulls
        over(partition by id order by sort) as level1_question,
    last_value(case when level = 2 and type = 'Group' then question end) 
        ignore nulls
        over(partition by id order by sort) as level2_question
from mytable t

在早期版本中(或在不支持ignore nulls的其他数据库中),我们仍然可以使用窗口函数(应该比在大型数据集上使用多个子查询更有效),但这需要更多工作。思路是定义包含一个非空值后面跟随0到N个空值的行组:

select t.*,
    max(question1) over(partition by id, grp1 order by sort) as level1_question,
    max(question2) over(partition by id, grp2 order by sort) as level2_question
from (
    select t.*,
        count(question1) over(partition by id order by sort) grp1,
        count(question2) over(partition by id order by sort) grp2
    from mytable t
    cross apply (
        values (
            case when level = 1 and type = 'Group' then question end,
            case when level = 2 and type = 'Group' then question end
        ) 
    ) as v(question1, question2)
) t
英文:

The syntax of your code suggests that you are running SQL Server.

In the 2022 version, we can use last_value() with ignore nulls to efficiently retrieve the latest non-null of the conditional computation:

select t.*,
    last_value(case when level = 1 and type = 'Group' then question end) 
        ignore nulls
        over(partition by id order by sort) as level1_question,
    last_value(case when level = 2 and type = 'Group' then question end) 
        ignore nulls
        over(partition by id order by sort) as level2_question
from mytable t

In earlier versions (or in other databases that do not support ignore nulls), we can still use window functions (which should be more efficient that multiple subqueries on a large dataset), but that requires more work. The idea is to define groups of rows that contain one non-null value followed by 0 to N null values:

select t.*,
    max(question1) over(partition by id, grp1 order by sort) as level1_question,
    max(question2) over(partition by id, grp2 order by sort) as level2_question
from (
    select t.*,
        count(question1) over(partition by id order by sort) grp1,
        count(question2) over(partition by id order by sort) grp2
    from mytable t
    cross apply (
        values (
            case when level = 1 and type = 'Group' then question end,
            case when level = 2 and type = 'Group' then question end
        ) 
    ) as v(question1, question2)
) t

答案2

得分: 0

我认为你经历了内存溢出,这是由于自连接引起的。创建一个CTE或两个,去掉带有null的行,然后将这些CTE连接到表中。我打赌,如果你查看执行计划,你会发现嵌套循环和排序正在影响你的查询。我的解决方案应该能减小这些影响。

英文:

I think you're experiencing an overflow in memory due to the self join. Create a CTE or 2 that removes the rows with null and then join the CTE(s) it to the table instead. I bet if you looked up your execution plan, you have nested loops and sorts that are killing hour query. My solution should reduce the impact of those.

huangapple
  • 本文由 发表于 2023年6月22日 04:40:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76526978.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定