英文:
SQL - Trickle down an overpayment to the following months
问题
在Amazon Redshift(使用DataGrip)中,我有一个表格,如下所示:
合同编号 | 开始月份 | 合同期限(月) | 收款截止日期 | 目标 | 已收金额 |
---|---|---|---|---|---|
10001 | 2022年01月01日 | 12 | 2022年01月01日 | 10000 | 40000 |
10001 | 2022年01月01日 | 12 | 2022年01月02日 | 10000 | 0 |
10001 | 2022年01月01日 | 12 | 2022年01月03日 | 10000 | 0 |
10001 | 2022年01月01日 | 12 | 2022年01月04日 | 10000 | 0 |
10001 | 2022年01月01日 | 12 | 2022年01月05日 | 10000 | 0 |
10001 | 2022年01月01日 | 12 | 2022年01月06日 | 10000 | 0 |
10001 | 2022年01月01日 | 12 | 2022年01月07日 | 10000 | 0 |
10001 | 2022年01月01日 | 12 | 2022年01月08日 | 10000 | 30000 |
10001 | 2022年01月01日 | 12 | 2022年01月09日 | 10000 | 2500 |
10001 | 2022年01月01日 | 12 | 2022年01月10日 | 10000 | 0 |
10001 | 2022年01月01日 | 12 | 2022年01月11日 | 10000 | 0 |
10001 | 2022年01月01日 | 12 | 2022年01月12日 | 10000 | 0 |
10002 | 2022年01月01日 | 8 | 2022年01月03日 | 5000 | 12000 |
10002 | 2022年01月01日 | 8 | 2022年01月04日 | 5000 | 1000 |
10002 | 2022年01月01日 | 8 | 2022年01月05日 | 5000 | 0 |
10002 | 2022年01月01日 | 8 | 2022年01月06日 | 5000 | 0 |
10002 | 2022年01月01日 | 8 | 2022年01月07日 | 5000 | 10000 |
10002 | 2022年01月01日 | 8 | 2022年01月08日 | 5000 | 0 |
10002 | 2022年01月01日 | 8 | 2022年01月09日 | 5000 | 0 |
10002 | 2022年01月01日 | 8 | 2022年01月10日 | 5000 | 0 |
每个月我都有一个目标金额和一个支付金额(已收金额),每个月我想要查看我已经实现了多少目标(实现的金额永远不能高于目标),如果支付金额超过目标,那么超出的金额将流向下个月。如下所示:
合同编号 | 开始月份 | 合同期限(月) | 收款截止日期 | 目标 | 已收金额 | 实现的金额 | 超额支付 |
---|---|---|---|---|---|---|---|
10001 | 2022年01月01日 | 12 | 2022年01月01日 | 10000 | 40000 | 10000 | 30000 |
10001 | 2022年01月01日 | 12 | 2022年01月02日 | 10000 | 0 | 10000 | 20000 |
10001 | 2022年01月01日 | 12 | 2022年01月03日 | 10000 | 0 | 10000 | 10000 |
10001 | 2022年01月01日 | 12 | 2022年01月04日 | 10000 | 0 | 10000 | 0 |
10001 | 2022年01月01日 | 12 | 2022年01月05日 | 10000 | 0 | 0 | 0 |
10001 | 2022年01月01日 | 12 | 2022年01月06日 | 10000 | 0 | 0 | 0 |
10001 | 2022年01月01日 | 12 | 2022年01月07日 | 10000 | 0 | 0 | 0 |
10001 | 2022年01月01日 | 12 | 2022年01月08日 | 10000 | 30000 | 10000 | 20000 |
10001 | 2022年01月01日 | 12 | 2022年01月09日 | 10000 | 2500 | 10000 | 12500 |
10001 | 2022年01月01日 | 12 | 2022年01月10日 | 10000 | 0 | 10000 | 2500 |
10001 | 2022年01月01日 | 12 | 2022年01月11日 | 10000 | 0 | 2500 | 0 |
10001 | 2022年01月01日 | 12 | 2022年01月12日 | 10000 | 0 | 0 | 0 |
10002 | 2022年01月01日 | 8 | 2022年01月03日 | 5000 | 12000 | 5000 | 7000 |
10002 | 2022年01月01日 | 8 | 2022年01月04日 | 5000 | 1000 | 5000 | 3000 |
10002 | 2022年01月01日 | 8 | 2022年01月05日 | 5000 | 0 | 3000 | 0 |
10002 | 2022年01月01日 | 8 | 2022年01月06日 | 5000 | 0 | 0 | 0 |
10002 | 2022年01月01日 | 8 |
英文:
I have a table in amazon redshift (datagrip) like so:
Contract_ID | Starting_Month | Contract_Duration_In_Months | Collection_Due_Date | Target | Amount_Collected |
---|---|---|---|---|---|
10001 | 01/01/2022 | 12 | 01/01/2022 | 10000 | 40000 |
10001 | 01/01/2022 | 12 | 01/02/2022 | 10000 | 0 |
10001 | 01/01/2022 | 12 | 01/03/2022 | 10000 | 0 |
10001 | 01/01/2022 | 12 | 01/04/2022 | 10000 | 0 |
10001 | 01/01/2022 | 12 | 01/05/2022 | 10000 | 0 |
10001 | 01/01/2022 | 12 | 01/06/2022 | 10000 | 0 |
10001 | 01/01/2022 | 12 | 01/07/2022 | 10000 | 0 |
10001 | 01/01/2022 | 12 | 01/08/2022 | 10000 | 30000 |
10001 | 01/01/2022 | 12 | 01/09/2022 | 10000 | 2500 |
10001 | 01/01/2022 | 12 | 01/10/2022 | 10000 | 0 |
10001 | 01/01/2022 | 12 | 01/11/2022 | 10000 | 0 |
10001 | 01/01/2022 | 12 | 01/12/2022 | 10000 | 0 |
10002 | 01/01/2022 | 8 | 01/03/2022 | 5000 | 12000 |
10002 | 01/01/2022 | 8 | 01/04/2022 | 5000 | 1000 |
10002 | 01/01/2022 | 8 | 01/05/2022 | 5000 | 0 |
10002 | 01/01/2022 | 8 | 01/06/2022 | 5000 | 0 |
10002 | 01/01/2022 | 8 | 01/07/2022 | 5000 | 10000 |
10002 | 01/01/2022 | 8 | 01/08/2022 | 5000 | 0 |
10002 | 01/01/2022 | 8 | 01/09/2022 | 5000 | 0 |
10002 | 01/01/2022 | 8 | 01/10/2022 | 5000 | 0 |
Whereby each month I have a target amount and a payment (amount_collected), and each month I want to see how much of my target I've achieved (the achieved amount can never be higher than the target), if the payment exceeds the target then the amount achieved will trickle down to the next month. Like so:
Contract_ID | Starting_Month | Contract_Duration_In_Months | Collection_Due_Date | Target | Amount_Collected | Achieved | Overpayment |
---|---|---|---|---|---|---|---|
10001 | 01/01/2022 | 12 | 01/01/2022 | 10000 | 40000 | 10000 | 30000 |
10001 | 01/01/2022 | 12 | 01/02/2022 | 10000 | 0 | 10000 | 20000 |
10001 | 01/01/2022 | 12 | 01/03/2022 | 10000 | 0 | 10000 | 10000 |
10001 | 01/01/2022 | 12 | 01/04/2022 | 10000 | 0 | 10000 | 0 |
10001 | 01/01/2022 | 12 | 01/05/2022 | 10000 | 0 | 0 | 0 |
10001 | 01/01/2022 | 12 | 01/06/2022 | 10000 | 0 | 0 | 0 |
10001 | 01/01/2022 | 12 | 01/07/2022 | 10000 | 0 | 0 | 0 |
10001 | 01/01/2022 | 12 | 01/08/2022 | 10000 | 30000 | 10000 | 20000 |
10001 | 01/01/2022 | 12 | 01/09/2022 | 10000 | 2500 | 10000 | 12500 |
10001 | 01/01/2022 | 12 | 01/10/2022 | 10000 | 0 | 10000 | 2500 |
10001 | 01/01/2022 | 12 | 01/11/2022 | 10000 | 0 | 2500 | 0 |
10001 | 01/01/2022 | 12 | 01/12/2022 | 10000 | 0 | 0 | 0 |
10002 | 01/01/2022 | 8 | 01/03/2022 | 5000 | 12000 | 5000 | 7000 |
10002 | 01/01/2022 | 8 | 01/04/2022 | 5000 | 1000 | 5000 | 3000 |
10002 | 01/01/2022 | 8 | 01/05/2022 | 5000 | 0 | 3000 | 0 |
10002 | 01/01/2022 | 8 | 01/06/2022 | 5000 | 0 | 0 | 0 |
10002 | 01/01/2022 | 8 | 01/07/2022 | 5000 | 10000 | 5000 | 5000 |
10002 | 01/01/2022 | 8 | 01/08/2022 | 5000 | 0 | 5000 | 0 |
10002 | 01/01/2022 | 8 | 01/09/2022 | 5000 | 0 | 0 | 0 |
10002 | 01/01/2022 | 8 | 01/10/2022 | 5000 | 0 | 0 | 0 |
I can't use SUM() OVER(ORDER BY MONTH ROWS UNBOUNDED PRECEDING) because even though the achievements in May, June and July were missed, the achieved column only looks forward, not backwards. I think the best way to achieve this would be to have an Overpayment column which is like a rolling sum of the previous overpayment values - the previously achieved value BUT if the overpayment was previously 0 then we have a month where there is an overpayment the previously rolling sum needs to be discarded and the overpayment value reset to the payment-target.
Simply put, I need overpayments to trickle to down the achieved columns over the months taking into consideration the payments made until overpayments are finished, ignoring any months where the target wasn't achieved.
答案1
得分: 1
以下是您提供的代码部分的翻译:
-- Creates an identifier each time an amount is collected
-- and groups all subsequent records until the next amount is collected
与数据1
AS(选择*,
SUM(CASE
WHEN amount_collected > 0 THEN 1
ELSE 0
END)
over(
PARTITION BY contract_id
ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
preceding
AND
CURRENT ROW)AS TRACKER
FROM trickle)
-- Provides a cumulative sum of the target/collected/balance amounts
-- within each 'tracker' group. LAG(TRACKER) is needed for a subsequent step
,
数据2
AS(选择*,
SUM(target)
over(
PARTITION BY contract_id, tracker
ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
preceding
AND
CURRENT ROW) TARGET_RUNNING,
SUM(amount_collected)
over(
PARTITION BY contract_id, tracker
ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
preceding
AND
CURRENT ROW) AMT_COLL_RUNNING,
SUM(amount_collected - target)
over(
PARTITION BY contract_id, tracker
ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
preceding
AND
CURRENT ROW) BAL_RUNNING,
Lag(tracker)
over(
PARTITION BY contract_id
ORDER BY collection_due_date ASC) LAG_TRACKER
FROM 数据1)
-- BAL_RUNNING is calculated using a windows function so lag_run_bal needs to be
-- in another CTE as nested windows functions aren't allowed
,
数据3
AS(选择*,
Lag(bal_running)
over(
PARTITION BY contract_id
ORDER BY collection_due_date ASC) LAG_RUN_BAL
FROM 数据2)
-- Calculates the available balance
,
数据4
AS(选择*,
CASE
WHEN tracker <> lag_tracker
AND lag_run_bal > 0 THEN amt_coll_running + lag_run_bal
ELSE amt_coll_running
END AMT_COLL_RUNNING1
FROM 数据3)
选择contract_id,
collection_due_date,
target,
amount_collected,
CASE
WHEN target_running <= amt_coll_running1 THEN target
ELSE 0
END ACHIEVED,
Greatest(amt_coll_running1 - target_running, 0) OVERPAYMENT
FROM 数据4
ORDER BY contract_id,
collection_due_date ASC;
更新的解决方案
我认为您的输出表中可能有一个拼写错误:
Contract_ID | Starting_Month | Contract_Duration_In_Months | Collection_Due_Date | Target | Amount_Collected | Achieved | Overpayment |
---|---|---|---|---|---|---|---|
10001 | 01/01/2022 | 12 | 01/09/2022 | 10000 | 2500 | 10000 | 12500 |
10001 | 01/01/2022 | 12 | 01/10/2022 | 10000 | 0 | 2500 | 0 |
10001 | 01/01/2022 | 12 | 01/11/2022 | 10000 | 0 | 0 | 0 |
应该是:
Contract_ID | Starting_Month | Contract_Duration_In_Months | Collection_Due_Date | Target | Amount_Collected | Achieved | Overpayment |
---|---|---|---|---|---|---|---|
10001 | 01/01/2022 | 12 | 01/09/2022 | 10000 | 2500 | 10000 | 12500 |
10001 | 01/01/2022 | 12 | 01/10/2022 | 10000 | 0 | 10000 | 2500 |
10001 | 01/01/2022 | 12 | 01/11/2022 | 10000 | 0 | 2500 | 0 |
如果这是一个拼写错误,我认为这个SQL应该可以给您想要的结果:
WITH DATA0 AS (
SELECT
*,
NVL(
LAG(AMOUNT_COLLECTED) OVER (
PARTITION BY CONTRACT_ID
ORDER BY
COLLECTION_DUE_DATE ASC
),
0
) LAG_AMT_COLL
FROM
TRICKLE
),
DATA1 AS (
SELECT
*,
SUM(
CASE WHEN amount_collected > 0
AND LAG_AMT_COLL = 0 THEN 1 ELSE 0 END
) over (
PARTITION BY contract_id
ORDER BY
collection_due_date ASC ROWS BETWEEN unbounded preceding
AND CURRENT ROW
) AS TRACKER
FROM
DATA0
),
DATA2 AS (
SELECT
*,
SUM(TARGET) OVER (
PARTITION BY CONTRACT_ID,
TRACKER
ORDER BY
COLLECTION_DUE_DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) TARGET_RUNNING,
SUM(AMOUNT_COLLECTED) OVER (
PARTITION BY CONTRACT_ID,
TRACKER
ORDER BY
COLLECTION_DUE_DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AMT_COLL_RUNNING,
SUM(AMOUNT_COLLECTED - TARGET) OVER (
PARTITION BY CONTRACT_ID,
TRACKER
ORDER BY
COLLECTION_DUE_DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) BAL_RUNNING,
Lag(tracker) over (
PARTITION BY contract_id
ORDER BY
collection_due_date ASC
) LAG_TRACKER
FROM
DATA1
),
DATA3 AS (
SELECT
*,
GREATEST(BAL_RUNNING, 0) BAL_RUNNING1
FROM
DATA2
)
SELECT
CONTRACT_ID,
STARTING_MONTH,
CONTRACT_DURATION_IN_MONTHS,
COLLECTION_DUE_DATE,
TARGET,
AMOUNT_COLLECTED,
LEAST(
TARGET,
NVL(
LAG(BAL_RUNNING1) over (
PARTITION BY contract_id
ORDER BY
collection_due_date ASC
),
0
) + AMOUNT_COLLECTED
) ACHIEVED,
Greatest(
amt_coll_running - target_running,
<details>
<summary>英文:</summary>
Interesting challenge! Probably one of the hardest SQL questions I've faced recently - so hopefully someone doesn't come along with a simpler/obvious solution :)
This appears to give the correct result based on the sample data provided. I've split it up into multiple CTEs to hopefully make the logic clearer; some of the CTEs could probably be combined:
-- Creates an identifier each time an amount is collected
-- and groups all subsequent records until the next amount is collected
WITH data1
AS (SELECT *,
SUM(CASE
WHEN amount_collected > 0 THEN 1
ELSE 0
END)
over (
PARTITION BY contract_id
ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
preceding
AND
CURRENT ROW) AS TRACKER
FROM trickle)
-- Provides a cumulative sum of the target/collected/balance amounts
-- within each 'tracker' group. LAG(TRACKER) is needed for a subsequent step
,
data2
AS (SELECT *,
SUM(target)
over (
PARTITION BY contract_id, tracker
ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
preceding
AND
CURRENT ROW) TARGET_RUNNING,
SUM(amount_collected)
over (
PARTITION BY contract_id, tracker
ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
preceding
AND
CURRENT ROW) AMT_COLL_RUNNING,
SUM(amount_collected - target)
over (
PARTITION BY contract_id, tracker
ORDER BY collection_due_date ASC ROWS BETWEEN unbounded
preceding
AND
CURRENT ROW) BAL_RUNNING,
Lag(tracker)
over (
PARTITION BY contract_id
ORDER BY collection_due_date ASC) LAG_TRACKER
FROM data1)
-- BAL_RUNNING is calculated using a windows function so lag_run_bal needs to be
-- in another CTE as nested windows functions aren't allowed
,
data3
AS (SELECT *,
Lag(bal_running)
over (
PARTITION BY contract_id
ORDER BY collection_due_date ASC) LAG_RUN_BAL
FROM data2)
-- Calculates the available balance
,
data4
AS (SELECT *,
CASE
WHEN tracker <> lag_tracker
AND lag_run_bal > 0 THEN amt_coll_running + lag_run_bal
ELSE amt_coll_running
END AMT_COLL_RUNNING1
FROM data3)
SELECT contract_id,
collection_due_date,
target,
amount_collected,
CASE
WHEN target_running <= amt_coll_running1 THEN target
ELSE 0
END ACHIEVED,
Greatest(amt_coll_running1 - target_running, 0) OVERPAYMENT
FROM data4
ORDER BY contract_id,
collection_due_date ASC;
**Updated Solution**
I think there may be a typo in your output table:
|Contract_ID|Starting_Month|Contract_Duration_In_Months|Collection_Due_Date|Target|Amount_Collected|Achieved|Overpayment|
|--|--|--|--|--|--|--|--|
|10001|01/01/2022|12|01/09/2022|10000|2500|10000|12500|
|10001|01/01/2022|12|01/10/2022|10000|0|2500|0|
|10001|01/01/2022|12|01/11/2022|10000|0|0|0|
should be:
|Contract_ID|Starting_Month|Contract_Duration_In_Months|Collection_Due_Date|Target|Amount_Collected|Achieved|Overpayment|
|--|--|--|--|--|--|--|--|
|10001|01/01/2022|12|01/09/2022|10000|2500|10000|12500|
|10001|01/01/2022|12|01/10/2022|10000|0|10000|2500|
|10001|01/01/2022|12|01/11/2022|10000|0|2500|0|
Assuming that is a typo and my correction is correct then this SQL should give you what you want:
WITH DATA0 AS (
SELECT
*,
NVL(
LAG(AMOUNT_COLLECTED) OVER (
PARTITION BY CONTRACT_ID
ORDER BY
COLLECTION_DUE_DATE ASC
),
0
) LAG_AMT_COLL
FROM
TRICKLE
),
DATA1 AS (
SELECT
*,
SUM(
CASE WHEN amount_collected > 0
AND LAG_AMT_COLL = 0 THEN 1 ELSE 0 END
) over (
PARTITION BY contract_id
ORDER BY
collection_due_date ASC ROWS BETWEEN unbounded preceding
AND CURRENT ROW
) AS TRACKER
FROM
DATA0
),
DATA2 AS (
SELECT
*,
SUM(TARGET) OVER (
PARTITION BY CONTRACT_ID,
TRACKER
ORDER BY
COLLECTION_DUE_DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) TARGET_RUNNING,
SUM(AMOUNT_COLLECTED) OVER (
PARTITION BY CONTRACT_ID,
TRACKER
ORDER BY
COLLECTION_DUE_DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AMT_COLL_RUNNING,
SUM(AMOUNT_COLLECTED - TARGET) OVER (
PARTITION BY CONTRACT_ID,
TRACKER
ORDER BY
COLLECTION_DUE_DATE ASC ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) BAL_RUNNING,
Lag(tracker) over (
PARTITION BY contract_id
ORDER BY
collection_due_date ASC
) LAG_TRACKER
FROM
DATA1
),
DATA3 AS (
SELECT
*,
GREATEST(BAL_RUNNING, 0) BAL_RUNNING1
FROM
DATA2
)
SELECT
CONTRACT_ID,
STARTING_MONTH,
CONTRACT_DURATION_IN_MONTHS,
COLLECTION_DUE_DATE,
TARGET,
AMOUNT_COLLECTED,
LEAST(
TARGET,
NVL(
LAG(BAL_RUNNING1) over (
PARTITION BY contract_id
ORDER BY
collection_due_date ASC
),
0
) + AMOUNT_COLLECTED
) ACHIEVED,
Greatest(
amt_coll_running - target_running,
0
) OVERPAYMENT
FROM
DATA3
ORDER BY
CONTRACT_ID,
COLLECTION_DUE_DATE
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论