英文:
Cumulative sum in BigQuery that resets based on its own value
问题
我需要计算列A的累积总和,并在达到某个阈值时将其重置。在下面的示例中,我正在计算累积总和,并在达到10或标签更改时将其重置。
标签 | 值 | 累积总和 |
---|---|---|
一 | 1 | 1 |
一 | 2 | 3 |
一 | 4 | 7 |
一 | 6 | 6 |
一 | 3 | 9 |
二 | 1 | 1 |
二 | 2 | 3 |
二 | 1 | 4 |
我在BigQuery中尝试了以下代码
SUM(value) OVER (PARTITION BY label ORDER BY dummy_sequence) as cumulative_sum,
但它没有产生预期的结果。
任何帮助都将不胜感激。
英文:
I need to calculate the cumulative sum of column A and need it to reset back once it reaches a certain threshold. In the below example, I am calculating cumulative sum and reset it back once it reaches 10 or the label changes.
Label | Value | cumulative_sum |
---|---|---|
One | 1 | 1 |
One | 2 | 3 |
One | 4 | 7 |
One | 6 | 6 |
One | 3 | 9 |
Two | 1 | 1 |
Two | 2 | 3 |
Two | 1 | 4 |
I have tried the following code in bigquery
SUM(value) OVER (PARTITION BY label ORDER BY dummy_sequence) as cumulative_sum,
But its not giving the result as intended.
Any help is much appreciated
答案1
得分: 2
你可以使用 RECURSIVE
来有条件地累积数值。
查询
CREATE TEMP TABLE sample_data AS (
WITH
_sample_data AS (
SELECT 'One' as Label, 1 as Value, 1 as expected_cumulative_sum,
UNION ALL SELECT 'One', 2, 3,
UNION ALL SELECT 'One', 4, 7,
UNION ALL SELECT 'One', 6, 6,
UNION ALL SELECT 'One', 3, 9,
UNION ALL SELECT 'Two', 1, 1,
UNION ALL SELECT 'Two', 2, 3,
UNION ALL SELECT 'Two', 1, 4,
UNION ALL SELECT 'Three', 5, 5,
UNION ALL SELECT 'Three', 4, 9,
UNION ALL SELECT 'Three', 8, 8,
UNION ALL SELECT 'Three', 7, 7,
UNION ALL SELECT 'Three', 5, 5,
UNION ALL SELECT 'Three', 4, 9,
)
SELECT *, ROW_NUMBER() OVER (PARTITION BY Label) as row_num,
FROM _sample_data
);
WITH
RECURSIVE calculate_cumulative_sum AS (
SELECT label, value, row_num, value AS cumulative_sum
FROM sample_data
WHERE row_num = 1
UNION ALL
SELECT
s.label, s.value, s.row_num,
IF(
-- 可能需要在这里选择 between 'gt' 和 '> ='
s.value + c.cumulative_sum >= 10,
s.value,
s.value + c.cumulative_sum
) AS cumulative_sum,
FROM sample_data AS s
INNER JOIN calculate_cumulative_sum AS c
ON s.label = c.label AND s.row_num = c.row_num + 1
)
SELECT label, row_num, value, cumulative_sum
FROM calculate_cumulative_sum
ORDER BY label, row_num
;
结果
英文:
You may want to use RECURSIVE
to accumulate the values conditionally.
Query
CREATE TEMP TABLE sample_data AS (
WITH
_sample_data AS (
SELECT 'One' as Label, 1 as Value, 1 as expected_cumulative_sum,
UNION ALL SELECT 'One', 2, 3,
UNION ALL SELECT 'One', 4, 7,
UNION ALL SELECT 'One', 6, 6,
UNION ALL SELECT 'One', 3, 9,
UNION ALL SELECT 'Two', 1, 1,
UNION ALL SELECT 'Two', 2, 3,
UNION ALL SELECT 'Two', 1, 4,
UNION ALL SELECT 'Three', 5, 5,
UNION ALL SELECT 'Three', 4, 9,
UNION ALL SELECT 'Three', 8, 8,
UNION ALL SELECT 'Three', 7, 7,
UNION ALL SELECT 'Three', 5, 5,
UNION ALL SELECT 'Three', 4, 9,
)
SELECT *, ROW_NUMBER() OVER (PARTITION BY Label) as row_num,
FROM _sample_data
);
WITH
RECURSIVE calculate_cumulative_sum AS (
SELECT label, value, row_num, value AS cumulative_sum
FROM sample_data
WHERE row_num = 1
UNION ALL
SELECT
s.label, s.value, s.row_num,
IF(
-- may want to decide between '>' and '>='
s.value + c.cumulative_sum >= 10,
s.value,
s.value + c.cumulative_sum
) AS cumulative_sum,
FROM sample_data AS s
INNER JOIN calculate_cumulative_sum AS c
ON s.label = c.label AND s.row_num = c.row_num + 1
)
SELECT label, row_num, value, cumulative_sum
FROM calculate_cumulative_sum
ORDER BY label, row_num
;
Results
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论