计算SQL中的总体成功率:递归CTE还是替代方法?

huangapple go评论83阅读模式
英文:

Calculating Overall Success Rate in SQL: Recursive CTE or Alternative Approach?

问题

以下是翻译的部分:

作为输入,我有3个字段:

  • Column_A:表示调查中的步骤按照时间顺序排列
  • Column_B:表示参与调查步骤的人数
  • Column_C:表示在特定调查步骤中成功的人数(B ≥ C)

现在我需要计算一个新的Column_D:这一列需要表示在所有先前步骤之后保持成功的参与者的总人数。因此,在每一步骤中,这个数字最好只能保持不变或者在不成功的参与者的情况下减少。

在Excel中,这相当容易,因为我会计算(除了第1行以外的行):
前一行的Value_D -(Value_B - Value_C)。

我知道SQL中有lag和lead函数,但如何考虑同一列的前一行的计算呢?它是否能够工作?我需要编写递归的CTE之类的东西吗?

我认为一个大问题是:行数(Column_A)不相同,所以我不能简单地将多个CTE连接在一起。

Column_A Column_B Column_C Column_D
0 35 35 35
1 35 35 35
2 35 34 34
3 34 33 33
4 33 30 30
5 33 31 28
6 33 33 28

我尝试了以下方法,但显然它返回了错误的结果,因为它没有考虑以前的计算。

SELECT [Column_A]
      ,[Column_B]
      ,[Column_C]
      , CASE WHEN [Column_A] = 0 
	         THEN [Column_B]
	         ELSE LAG([Column_C], 1, 0) OVER (PARTITION BY 1 
                                              ORDER     BY [Column_A] ASC) - ([Column_B] - [Column_C])
        END AS [Column_D]
FROM dataset
英文:

As input I have 3 fields:

  • Column_A: represents the step of the survey in a chronological order
  • Column_B: represents the number of persons, which participated in they survey step
  • Column_C: represents the number of persons, which were sucessfull in this specific survey step (B >= C)

Now I need to calculate a new Column_D: This column needs to represent the overall number of participants, which were remain successfull after all previous steps. So at best this number can only remain the same or being reduced after each step in case of an unsuccessfull participant.

In Excel this is kinda easy, because there i would calculate (except for row 1):
Value_D of former Row - (Value_B - Value_C).

I know of lag and lead in SQL, but how can I take into account the calculation of a former row of the same column? Does it work somehow? Do I need to code a recursive CTE or something?

I think big problem is this: The number of rows (Column_A) isnt the same, so i cant just glue several CTEs together.

Column_A Column_B Column_C Column_D
0 35 35 35
1 35 35 35
2 35 34 34
3 34 33 33
4 33 30 30
5 33 31 28
6 33 33 28

I tried the following, but obviously it gives back the wrong results, because it doesnt take the former calculations in consideration.

SELECT [Column_A]
      ,[Column_B]
      ,[Column_C]
      , CASE WHEN [Column_A] = 0 
	         THEN [Column_B]
	         ELSE LAG([Column_C], 1, 0) OVER (PARTITION BY 1 
                                              ORDER     BY [Column_A] ASC) - ([Column_B] - [Column_C])
        END AS [Column_D]
FROM dataset

答案1

得分: 2

以下是翻译好的内容:

你可以结合两个窗口函数:

  • FIRST_VALUE,收集第一个 "Column_D" 的值
  • SUM,收集 "Column_B" 和 "Column_C" 之间的运行差值

然后从第一个窗口函数中减去第二个窗口函数。

SELECT *, FIRST_VALUE(Column_D) OVER(ORDER BY Column_A ROWS UNBOUNDED PRECEDING) - 
          SUM(Column_B - Column_C) OVER(ORDER BY Column_A ROWS UNBOUNDED PRECEDING) 
FROM tab

输出

Column_A Column_B Column_C Column_D (无列名)
0 35 35 35 35
1 35 35 35 35
2 35 34 34 34
3 34 33 33 33
4 33 30 30 30
5 33 31 28 28
6 33 33 28 28

在此查看演示链接

英文:

You can combine two window functions:

  • FIRST_VALUE, gathering the first "Column_D" value
  • SUM, gathering the running difference between "Column_B" and "Column_C"

then you subtract the second window function from the first.

SELECT *, FIRST_VALUE(Column_D) OVER(ORDER BY Column_A ROWS UNBOUNDED PRECEDING) - 
          SUM(Column_B - Column_C) OVER(ORDER BY Column_A ROWS UNBOUNDED PRECEDING) 
FROM tab

Output:

Column_A Column_B Column_C Column_D (No column name)
0 35 35 35 35
1 35 35 35 35
2 35 34 34 34
3 34 33 33 33
4 33 30 30 30
5 33 31 28 28
6 33 33 28 28

Check the demo here.

huangapple
  • 本文由 发表于 2023年5月23日 00:26:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76308208.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定