英文:
How do I calculate day difference using more than one date?
问题
ID | 交易日期 | 取消标志 | 距取消天数 |
---|---|---|---|
1 | 2014-02-18 00:00:00.000 | 否 | -1 |
1 | 2014-02-18 00:00:00.000 | 否 | -1 |
1 | 2014-02-19 00:00:00.000 | 是 | 0 |
1 | 2014-05-20 00:00:00.000 | 否 | 1 |
1 | 2014-05-21 00:00:00.000 | 否 | -1 |
1 | 2014-05-22 00:00:00.000 | 是 | 0 |
1 | 2014-05-22 00:00:00.000 | 否 | 1 |
1 | 2014-05-23 00:00:00.000 | 否 | 2 |
英文:
I have the following table:
Table A:
ID | Transaction_Date | Cancel_Flag |
---|---|---|
1 | 2014-02-18 00:00:00.000 | No |
1 | 2014-02-18 00:00:00.000 | No |
1 | 2014-02-19 00:00:00.000 | Yes |
1 | 2014-05-20 00:00:00.000 | No |
1 | 2014-05-21 00:00:00.000 | No |
1 | 2014-05-22 00:00:00.000 | Yes |
1 | 2014-05-23 00:00:00.000 | No |
I want an output that looks like this:
-
Calculate the day difference between the transaction_date(where cancel_flag = No) and transaction_date(where cancel_flag = Yes).
-
If there's more than 1 cancellation_flag = Yes. The day difference used should be the minimum.
ID | Transaction_Date | Cancel_Flag | Days_Since_Cancel |
---|---|---|---|
1 | 2014-02-18 00:00:00.000 | No | -1 |
1 | 2014-02-18 00:00:00.000 | No | -1 |
1 | 2014-02-19 00:00:00.000 | Yes | 0 |
1 | 2014-05-20 00:00:00.000 | No | 1 |
1 | 2014-05-21 00:00:00.000 | No | -1 |
1 | 2014-05-22 00:00:00.000 | Yes | 0 |
1 | 2014-05-22 00:00:00.000 | No | +1 |
1 | 2014-05-23 00:00:00.000 | No | +2 |
Thanks in advance,
答案1
得分: 2
以下是翻译好的部分:
-
对于每条记录,您感兴趣的唯一“cancel”行是在数据集按transaction_date排序时紧挨着当前行的那一行或在当前行之后的那一行。因此,在这里,涉及窗口函数的解决方案似乎相当合适。
-
对于任何给定的行,您可以通过以下方式获取前一个取消交易的日期:
max(Case When Cancel_Flag='Yes' Then transaction_date End)
Over (Partition By ID Order By Transaction_Date Rows Between Unbounded Preceding And Current Row)
- 并通过以下方式获取后续取消交易的日期:
min(Case When Cancel_Flag='Yes' Then transaction_date End)
Over (Partition By ID Order By Transaction_Date Rows Between Current Row And Unbounded Following)
- 只需将它们与当前行的交易日期一起使用datediff(),然后您将得到两个可能的结果,您可以从中选择最终结果。
Select ID,Transaction_Date,Cancel_Flag,
Case When prior_cancel is null or next_cancel<abs(prior_cancel)
Then next_cancel Else prior_cancel End as Days_Since_Cancel
From (
Select A.*,
datediff(day,Transaction_Date,
max(Case When Cancel_Flag='Yes' Then transaction_date End)
Over (Partition By ID Order By Transaction_Date Rows Between Unbounded Preceding And Current Row)
) as prior_cancel,
datediff(day,Transaction_Date,
min(Case When Cancel_Flag='Yes' Then transaction_date End)
Over (Partition By ID Order By Transaction_Date Rows Between Current Row And Unbounded Following)
) as next_cancel
From Table_A A
)
Order By ID,Transaction_Date
编辑附加信息:请注意,可以在min(...)的位置使用first_value(... Ignore Nulls),在max(...)的位置使用last_value(... Ignore Nulls)。这可能会更有效一些,因为理论上,可以在不检查每个元素的情况下确定first和last,而不能确定min和max。这两种方法在Order By列和min/max列相同时始终是功能上等效的,就像在这种情况下的Transaction_Date。
英文:
For each record, the only 'cancel' rows you are interested in are the one just before or the one just after the current row when the data set is sorted by transaction_date. Because of this, solutions involving window functions seem quite appropriate here.
For any given row, you can get the date of the prior cancel transaction by
max(Case When Cancel_Flag='Yes' Then transaction_date End)
Over (Partition By ID Order By Transaction_Date Rows Between Unbounded Preceding And Current Row)
, and the date of the following cancel transaction with
min(Case When Cancel_Flag='Yes' Then transaction_date End)
Over (Partition By ID Order By Transaction_Date Rows Between Current Row And Unbounded Following)
Just use each in a datediff() with the current rows transaction date, and you've got two possible results that you can select from to get the final result.
Select ID,Transaction_Date,Cancel_Flag,
Case When prior_cancel is null or next_cancel<abs(prior_cancel)
Then next_cancel Else prior_cancel End as Days_Since_Cancel
From (
Select A.*,
datediff(day,Transaction_Date,
max(Case When Cancel_Flag='Yes' Then transaction_date End)
Over (Partition By ID Order By Transaction_Date Rows Between Unbounded Preceding And Current Row)
) as prior_cancel,
datediff(day,Transaction_Date,
min(Case When Cancel_Flag='Yes' Then transaction_date End)
Over (Partition By ID Order By Transaction_Date Rows Between Current Row And Unbounded Following)
) as next_cancel
From Table_A A
)
Order By ID,Transaction_Date
EDIT ADDITION
Note that, in place of min(...) you can use first_value(... Ignore Nulls) and in place of max(...) you can use last_value(... Ignore Nulls). These might be a tiny bit more efficient because while you cannot determine min & max without examining the entire window frame, in theory first and last can be determined without examining every element. These are always functionally equivalent when the Order By column and the min/max(column) are the same, in this case Transaction_Date.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论