2023年6月22日 04:40:49go评论97阅读模式

英文:

Fill in Nulls with Row above

问题

我真的很困扰，要创建'level1 question'和'level2 question'列的一些代码。
填充这些列的逻辑是：
- 当级别=1且类型=组时，'L1'列为question
- 当级别=2且类型=组时，'L2'列为question
但我找不到创建'level1 question'和'level2 question'的有效方法，就像示例中那样。
我尝试使用以下代码，但速度非常慢。在500万行上花了大约2个小时。
'level1 question' = 
                (select top 1 L1 
                from table2 t2
                where L1 is not null
                and t1.Sort >= t2.Sort and t1.ID = t2.ID
                order by id , sort DESC)
, 'level2 question' = 
            (select top 1 L2 
            from table2 t2
            where L2 is not null AND (Level >2 OR Type = 'Group')
            and t1.Sort >= t2.Sort and t1.ID = t2.ID
            order by id , sort DESC)

英文:

I am really struggling with producing some code create 'level1 question' & 'level2 question' columns.

The logic to populate the columns is:

'L1' when level = 1 & type = group then question
'L2' when level = 2 & type = group then question

But I can not find an efficient way of creating 'level1 question' & 'level2 question' like in the example.

id	sort	type	question	level	answer	L1	L2	level1 question	level2 question
1	1	Group	Location	1	NULL	Location	NULL	Location	NULL
1	2	Question	UK	2	Yes	NULL	NULL	Location	NULL
1	3	Question	EU	2	No	NULL	NULL	Location	NULL
1	4	Group	Asia	2	Yes	NULL	ASIA	Location	ASIA
1	5	Question	AUS	3	Yes	NULL	NULL	Location	ASIA
1	6	Question	NZ	3	No	NULL	NULL	Location	ASIA
1	7	Group	Season	1	NULL	Season	NULL	Season	NULL
1	8	Question	Summer	2	Yes	NULL	NULL	Season	NULL
2	1	Group	City	1	NULL	City	NULL	City	NULL
2	2	Question	London	2	Yes	NULL	NULL	City	NULL
2	3	Group	Street	2	NULL	NULL	Street	City	Street
2	4	Question	King ST	3	No	NULL	NULL	City	Street

I have tried to use the following code but it is so slow. Just taken ~2 hours over 5 million rows.

&#39;level1 question&#39; = 
                (select top 1 L1 
    			from table2 t2
    			where L1 is not null
    			and t1.Sort &gt;= t2.Sort and t1.ID = t2.ID
    			order by id , sort DESC)
, &#39;level2 question&#39; = 
            (select top 1 L2 
			from table2 t2
			where L2 is not null AND (Level &gt;2 OR Type = &#39;Group&#39;)
			and t1.Sort &gt;= t2.Sort and t1.ID = t2.ID
			order by id , sort DESC)

答案1

得分: 2

你的代码语法表明你正在运行SQL Server。

在2022版本中，我们可以使用last_value()和ignore nulls来高效地检索条件计算的最新非空值：

select t.*,
    last_value(case when level = 1 and type = 'Group' then question end) 
        ignore nulls
        over(partition by id order by sort) as level1_question,
    last_value(case when level = 2 and type = 'Group' then question end) 
        ignore nulls
        over(partition by id order by sort) as level2_question
from mytable t

在早期版本中（或在不支持ignore nulls的其他数据库中），我们仍然可以使用窗口函数（应该比在大型数据集上使用多个子查询更有效），但这需要更多工作。思路是定义包含一个非空值后面跟随0到N个空值的行组：

select t.*,
    max(question1) over(partition by id, grp1 order by sort) as level1_question,
    max(question2) over(partition by id, grp2 order by sort) as level2_question
from (
    select t.*,
        count(question1) over(partition by id order by sort) grp1,
        count(question2) over(partition by id order by sort) grp2
    from mytable t
    cross apply (
        values (
            case when level = 1 and type = 'Group' then question end,
            case when level = 2 and type = 'Group' then question end
        ) 
    ) as v(question1, question2)
) t

英文:

The syntax of your code suggests that you are running SQL Server.

In the 2022 version, we can use last_value() with ignore nulls to efficiently retrieve the latest non-null of the conditional computation:

select t.*,
    last_value(case when level = 1 and type = &#39;Group&#39; then question end) 
        ignore nulls
        over(partition by id order by sort) as level1_question,
    last_value(case when level = 2 and type = &#39;Group&#39; then question end) 
        ignore nulls
        over(partition by id order by sort) as level2_question
from mytable t

In earlier versions (or in other databases that do not support ignore nulls), we can still use window functions (which should be more efficient that multiple subqueries on a large dataset), but that requires more work. The idea is to define groups of rows that contain one non-null value followed by 0 to N null values:

select t.*,
    max(question1) over(partition by id, grp1 order by sort) as level1_question,
    max(question2) over(partition by id, grp2 order by sort) as level2_question
from (
    select t.*,
        count(question1) over(partition by id order by sort) grp1,
        count(question2) over(partition by id order by sort) grp2
    from mytable t
    cross apply (
        values (
            case when level = 1 and type = &#39;Group&#39; then question end,
            case when level = 2 and type = &#39;Group&#39; then question end
        ) 
    ) as v(question1, question2)
) t

答案2

得分: 0

我认为你经历了内存溢出，这是由于自连接引起的。创建一个CTE或两个，去掉带有null的行，然后将这些CTE连接到表中。我打赌，如果你查看执行计划，你会发现嵌套循环和排序正在影响你的查询。我的解决方案应该能减小这些影响。

英文:

I think you're experiencing an overflow in memory due to the self join. Create a CTE or 2 that removes the rows with null and then join the CTE(s) it to the table instead. I bet if you looked up your execution plan, you have nested loops and sorts that are killing hour query. My solution should reduce the impact of those.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

用上一行的值填充空白值

问题

答案1

答案2

如何在Select Distinct中返回列 – 但在Distinct语句中排除它。

如何配置Spring Boot连接一个Docker化的MySQL数据库？

需要将SQL查询转换为Gorm查询。

如何在Gorm中删除关联关系的相关模型？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。