在过去一年中,对于分组内的多个日期间隔进行累积计算天数。

huangapple go评论72阅读模式
英文:

Counting days cummulatively over the last year over multiple date intervals within groups

问题

这是上一个问题的延续。在解决了我的初始问题之后(上一个问题),我意识到我需要在问题中增加一些复杂性,以正确解决我的用例。

我需要计算一年中的前100天,其中包含多个日期间隔的组。每个组的年初是根据第一个间隔的第一个日期任意设置的。我还需要能够输出与每个间隔关联的两个变量[Location]和[Amount]。我的问题是,其中一些间隔将与单个日期重叠(相同的EndDate和接下来的StartDate),这会破坏我的计数。

我有以下示例数据Fiddle for sample data and code attempt

Group Location Amount StartDate EndDate
1 Location1 100 2020-01-01 2021-02-01
1 Location2 150 2021-02-01 2022-07-01
2 Location3 200 2020-02-01 2020-03-01
2 Location1 100 2020-09-01 2020-12-01

我希望得到以下输出:

Group Location Amount YearMonth CountDays
1 Location1 100.00 202001 31
1 Location1 100.00 202002 29
1 Location1 100.00 202003 31
1 Location1 100.00 202004 9
1 Location1 100.00 202005 0
1 Location1 100.00 202006 0
1 Location1 100.00 202007 0
1 Location1 100.00 202008 0
1 Location1 100.00 202009 0
1 Location1 100.00 202010 0
1 Location1 100.00 202011 0
1 Location1 100.00 202012 0
1 Location1 100.00 202101 31
1 Location1 100.00 202102 1
1 Location2 150.00 202102 28
1 Location2 150.00 202103 31
1 Location2 150.00 202104 9
1 Location2 150.00 202105 0
1 Location2 150.00 202106 0
1 Location2 150.00 202107 0
1 Location2 150.00 202108 0
1 Location2 150.00 202109 0
1 Location2 150.00 202110 0
1 Location2 150.00 202111 0
1 Location2 150.00 202112 0
1 Location2 150.00 202201 31
1 Location2 150.00 202202 28
1 Location2 150.00 202203 31
1 Location2 150.00 202204 10
1 Location2 150.00 202205 0
1 Location2 150.00 202206 0
1 Location2 150.00 202207 0
2 Location3 200.00 202002 29
2 Location3 200.00 202003 1
2 Location1 100.00 202009 30
2 Location1 100.00 202010 31
2 Location1 100.00 202011 9
2 Location1 100.00 202012 0

我尝试使用以下代码(从上一个问题改写而来):

DECLARE @MinDate DATE = (SELECT MIN(StartDate) FROM SampleData)
DECLARE @MaxDate DATE = (SELECT MAX(EndDate) FROM SampleData)

;WITH n AS 
(
    SELECT 0 AS n
    UNION ALL
    SELECT n + 1 
    FROM n 
    WHERE n + 1 <= DATEDIFF(DAY, @MinDate, @MaxDate)
)
SELECT DATEADD(DAY, n, @MinDate) AS Date
INTO #Calendar
FROM n 
OPTION (MAXRECURSION 0)

  
CREATE TABLE #Result
(
    Date DATE,
    [Group] VARCHAR(20),
	[Location] VARCHAR(100),
	[Amount] DECIMAL(10,2),
    Counted INT,
    CumulativeCount INT,
    ResetDate DATE
);

DECLARE @Group VARCHAR(20), @Date DATE, @Location VARCHAR(100), @Amount DECIMAL(10,2);

DECLARE GroupCursor CURSOR FOR 
SELECT DISTINCT [Group], [Location], [Amount] FROM SampleData
OPEN GroupCursor
FETCH NEXT FROM GroupCursor INTO @Group, @Location, @Amount

WHILE @@FETCH_STATUS = 0
BEGIN
	DECLARE DateCursor CURSOR FOR 
	SELECT Date FROM #Calendar ORDER BY Date
	OPEN DateCursor
	FETCH NEXT FROM DateCursor INTO @Date

	DECLARE @CumulativeCount INT = 0, @ResetDate DATE = NULL
	WHILE @@FETCH_STATUS = 0
	BEGIN

		DECLARE @Counted INT = 
		(
			SELECT CASE WHEN COUNT(*) > 0 THEN 1 ELSE 0 END
			FROM SampleData
			WHERE [Group] = @Group AND @Date BETWEEN StartDate AND EndDate
		)

			IF @Counted = 1 AND (@CumulativeCount = 0 OR DATEDIFF(DAY, @ResetDate, @Date) >= DATEDIFF(DAY, @ResetDate, DATEADD(YEAR, 1, @ResetDate))) --- Altered check to account for leap years. 
				BEGIN
					SET @CumulativeCount = 1
					SET @ResetDate = @Date  
				END
			ELSE IF @Counted = 1 AND @CumulativeCount < 100 
				SET @CumulativeCount = @CumulativeCount + 1
			ELSE IF @Counted = 1 AND @CumulativeCount >= 100
				SET @Counted = 0
		

		IF  
		( 
			SELECT CASE WHEN COUNT(*) > 0 THEN 1 ELSE 0 END
	          FROM SampleData
	          WHERE [Group] = @Group AND [Location] = @Location AND [Amount] = @Amount AND @Date BETWEEN StartDate AND EndDate
		) = 1
			BEGIN
				INSERT INTO #Result (Date, [Group], [Location], [Amount], Counted, CumulativeCount, ResetDate)
				VALUES (@Date, @Group, @Location, @Amount, @Counted, @CumulativeCount, @ResetDate)
			END

		FETCH NEXT FROM DateCursor INTO @Date
	END
	CLOSE DateCursor
	DEALLOCATE DateCursor

	FETCH NEXT FROM GroupCursor INTO @Group, @Location, @Amount

END
CLOSE GroupCursor
DEALLOCATE GroupCursor

SELECT * FROM #Result
ORDER BY [Group], Date

SELECT 
    [Group],
	[Location],
	[Amount],
    FORMAT(Date, 'yyyyMM') AS YearMonth,
    SUM(Counted) AS CountDays
FROM #Result
GROUP BY [Group], [Location], [Amount], FORMAT(Date, 'yyyyMM')
ORDER BY [Group], YearMonth, [Location]

这几乎完全符合我的需求,但是我在重复的日期上有一个问题(例如2021-02-01)。在这里,变量@CumulativeCount在重复的日期上没有正确更新,这导致最终计数超过100天。我尝试了多种解决方法和修复方法,但迄今为止没有找到解决方案(并且没有破坏任何内容)。

这是#Results的一部分,以显示问题:

Date Group Location Amount Counted CumulativeCount ResetDate
2021-01-31 1 Location1 100.00 1 31 2021-01-01
2021-02-01 1 Location1 100.00 1 32 2021-01-01
2021-02-01 1 Location2 150.00 1 32 2021-01-01
2021-02-02 1 Location2 150.00 1 33 2021-01-01

非常感谢您的帮助。

英文:

This is a continuation of a previous question. After having resolved my initial question (Previous question), I realized that I needed to add some complexity to my question to properly solve my usecase.

I need to calculate the first 100 days of a year within groups which contains multiple date intervals. The start of the year is set arbitrarily from the first date of the first interval within each group. I also need to be able to output two variables [Location] and [Amount] associated with each interval. My problem is then that some of these intervals will overlap with a single day (identical EndDate and following StartDate), which messes up my count.

I have the following sample data Fiddle for sample data and code attempt

Group Location Amount StartDate EndDate
1 Location1 100 2020-01-01 2021-02-01
1 Location2 150 2021-02-01 2022-07-01
2 Location3 200 2020-02-01 2020-03-01
2 Location1 100 2020-09-01 2020-12-01

It would like the following output:

Group Location Amount YearMonth CountDays
1 Location1 100.00 202001 31
1 Location1 100.00 202002 29
1 Location1 100.00 202003 31
1 Location1 100.00 202004 9
1 Location1 100.00 202005 0
1 Location1 100.00 202006 0
1 Location1 100.00 202007 0
1 Location1 100.00 202008 0
1 Location1 100.00 202009 0
1 Location1 100.00 202010 0
1 Location1 100.00 202011 0
1 Location1 100.00 202012 0
1 Location1 100.00 202101 31
1 Location1 100.00 202102 1
1 Location2 150.00 202102 28
1 Location2 150.00 202103 31
1 Location2 150.00 202104 9
1 Location2 150.00 202105 0
1 Location2 150.00 202106 0
1 Location2 150.00 202107 0
1 Location2 150.00 202108 0
1 Location2 150.00 202109 0
1 Location2 150.00 202110 0
1 Location2 150.00 202111 0
1 Location2 150.00 202112 0
1 Location2 150.00 202201 31
1 Location2 150.00 202202 28
1 Location2 150.00 202203 31
1 Location2 150.00 202204 10
1 Location2 150.00 202205 0
1 Location2 150.00 202206 0
1 Location2 150.00 202207 0
2 Location3 200.00 202002 29
2 Location3 200.00 202003 1
2 Location1 100.00 202009 30
2 Location1 100.00 202010 31
2 Location1 100.00 202011 9
2 Location1 100.00 202012 0

I tried using the following (rewritten from previous question):

DECLARE @MinDate DATE = (SELECT MIN(StartDate) FROM SampleData)
DECLARE @MaxDate DATE = (SELECT MAX(EndDate) FROM SampleData)
;WITH n AS 
(
SELECT 0 AS n
UNION ALL
SELECT n + 1 
FROM n 
WHERE n + 1 <= DATEDIFF(DAY, @MinDate, @MaxDate)
)
SELECT DATEADD(DAY, n, @MinDate) AS Date
INTO #Calendar
FROM n 
OPTION (MAXRECURSION 0)
CREATE TABLE #Result
(
Date DATE,
[Group] VARCHAR(20),
[Location] VARCHAR(100),
[Amount] DECIMAL(10,2),
Counted INT,
CumulativeCount INT,
ResetDate DATE
);
DECLARE @Group VARCHAR(20), @Date DATE, @Location VARCHAR(100), @Amount DECIMAL(10,2);
DECLARE GroupCursor CURSOR FOR 
SELECT DISTINCT [Group], [Location], [Amount] FROM SampleData
OPEN GroupCursor
FETCH NEXT FROM GroupCursor INTO @Group, @Location, @Amount
WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE DateCursor CURSOR FOR 
SELECT Date FROM #Calendar ORDER BY Date
OPEN DateCursor
FETCH NEXT FROM DateCursor INTO @Date
DECLARE @CumulativeCount INT = 0, @ResetDate DATE = NULL
WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE @Counted INT = 
(
SELECT CASE WHEN COUNT(*) > 0 THEN 1 ELSE 0 END
FROM SampleData
WHERE [Group] = @Group AND @Date BETWEEN StartDate AND EndDate
)
IF @Counted = 1 AND (@CumulativeCount = 0 OR DATEDIFF(DAY, @ResetDate, @Date) >= DATEDIFF(DAY, @ResetDate, DATEADD(YEAR, 1, @ResetDate))) --- Altered check to account for leap years. 
BEGIN
SET @CumulativeCount = 1
SET @ResetDate = @Date  
END
ELSE IF @Counted = 1 AND @CumulativeCount < 100 
SET @CumulativeCount = @CumulativeCount + 1
ELSE IF @Counted = 1 AND @CumulativeCount >= 100
SET @Counted = 0
IF  
( 
SELECT CASE WHEN COUNT(*) > 0 THEN 1 ELSE 0 END
FROM SampleData
WHERE [Group] = @Group AND [Location] = @Location AND [Amount] = @Amount AND @Date BETWEEN StartDate AND EndDate
) = 1
BEGIN
INSERT INTO #Result (Date, [Group], [Location], [Amount], Counted, CumulativeCount, ResetDate)
VALUES (@Date, @Group, @Location, @Amount, @Counted, @CumulativeCount, @ResetDate)
END
FETCH NEXT FROM DateCursor INTO @Date
END
CLOSE DateCursor
DEALLOCATE DateCursor
FETCH NEXT FROM GroupCursor INTO @Group, @Location, @Amount
END
CLOSE GroupCursor
DEALLOCATE GroupCursor
SELECT * FROM #Result
ORDER BY [Group], Date
SELECT 
[Group],
[Location],
[Amount],
FORMAT(Date, 'yyyyMM') AS YearMonth,
SUM(Counted) AS CountDays
FROM #Result
GROUP BY [Group], [Location], [Amount], FORMAT(Date, 'yyyyMM')
ORDER BY [Group], YearMonth, [Location]

This does nearly exactly what I need, but I have a problem with the duplicate dates (example of 2021-02-01). Here, the variable @CumulativeCount does not update properly across the duplicate dates - which leads to counting more than 100 days in the end. I have tried multiple workarounds and fixes but have so far come up with no solution that fixes it (and doesnt break anything).

Here's a snippet of the #Results to show the problem:

Date Group Location Amount Counted CumulativeCount ResetDate
2021-01-31 1 Location1 100.00 1 31 2021-01-01
2021-02-01 1 Location1 100.00 1 32 2021-01-01
2021-02-01 1 Location2 150.00 1 32 2021-01-01
2021-02-02 1 Location2 150.00 1 33 2021-01-01

Any help would be much appreciated.

答案1

得分: 1

看起来你想要使用一个日历表,并在查询中使用它。如果你不想实现一个日历表或函数,你可以使用一个伪版本,使用CTE(公共表表达式)来构建:

;WITH CalendarCTE AS (
SELECT MIN(StartDate) AS Date, MAX(EndDate) AS mxDate, DATEPART(MONTH,MIN(StartDate)) AS Month, DATEPART(YEAR,MIN(StartDate)) AS Year
FROM @Table
UNION ALL
SELECT DATEADD(DAY,1,Date), mxDate, DATEPART(MONTH,DATEADD(DAY,1,Date)) AS Month, DATEPART(YEAR,DATEADD(DAY,1,Date)) AS Year
  FROM CalendarCTE
 WHERE Date < mxDate
)

SELECT t.GroupID, t.Location, t.StartDate, t.EndDate, c.Year, c.Month, COUNT(CASE WHEN DATEDIFF(DAY,(DATEADD(YEAR,DATEDIFF(YEAR,t.StartDate,c.Date),t.StartDate)),c.Date) < 100 THEN 1 END) AS CountDays
  FROM @Table t
    INNER JOIN CalendarCTE c
	  ON t.StartDate <= c.Date
	  AND t.EndDate >= c.Date
 GROUP BY t.GroupID, t.Location, t.StartDate, t.EndDate, c.Year, c.Month
 ORDER BY GroupID, Location, c.Year, c.Month
   OPTION (MAXRECURSION 0)

这个查询将返回一个结果集,其中包含GroupID、Location、StartDate、EndDate、Year、Month和CountDays等列。

英文:

Sounds like you're going to want to employ a calendar table, and then use it something like this:

SELECT t.GroupID, t.Location, t.StartDate, t.EndDate, c.Year, c.Month, COUNT(CASE WHEN DATEDIFF(DAY,(DATEADD(YEAR,DATEDIFF(YEAR,t.StartDate,c.Date),t.StartDate)),c.Date) &lt; 100 THEN 1 END) AS CountDays
  FROM @Table t
    INNER JOIN dbo.Calendar(3,1) c
	  ON t.StartDate &lt;= c.Date
	  AND t.EndDate &gt;= c.Date
 GROUP BY t.GroupID, t.Location, t.StartDate, t.EndDate, c.Year, c.Month
 ORDER BY GroupID, Location, c.Year, c.Month
GroupID Location StartDate EndDate Year Month CountDays
1 Location1 2020-01-01 00:00:00.000 2021-02-01 00:00:00.000 2020 1 31
1 Location1 2020-01-01 00:00:00.000 2021-02-01 00:00:00.000 2020 2 29
1 Location1 2020-01-01 00:00:00.000 2021-02-01 00:00:00.000 2020 3 31
1 Location1 2020-01-01 00:00:00.000 2021-02-01 00:00:00.000 2020 4 9
1 Location1 2020-01-01 00:00:00.000 2021-02-01 00:00:00.000 2020 5 0
1 Location1 2020-01-01 00:00:00.000 2021-02-01 00:00:00.000 2020 6 0
1 Location1 2020-01-01 00:00:00.000 2021-02-01 00:00:00.000 2020 7 0
1 Location1 2020-01-01 00:00:00.000 2021-02-01 00:00:00.000 2020 8 0
1 Location1 2020-01-01 00:00:00.000 2021-02-01 00:00:00.000 2020 9 0
1 Location1 2020-01-01 00:00:00.000 2021-02-01 00:00:00.000 2020 10 0
1 Location1 2020-01-01 00:00:00.000 2021-02-01 00:00:00.000 2020 11 0
1 Location1 2020-01-01 00:00:00.000 2021-02-01 00:00:00.000 2020 12 0
1 Location1 2020-01-01 00:00:00.000 2021-02-01 00:00:00.000 2021 1 31
1 Location1 2020-01-01 00:00:00.000 2021-02-01 00:00:00.000 2021 2 1
1 Location2 2021-02-01 00:00:00.000 2022-07-01 00:00:00.000 2021 2 28
1 Location2 2021-02-01 00:00:00.000 2022-07-01 00:00:00.000 2021 3 31
1 Location2 2021-02-01 00:00:00.000 2022-07-01 00:00:00.000 2021 4 30
1 Location2 2021-02-01 00:00:00.000 2022-07-01 00:00:00.000 2021 5 11
1 Location2 2021-02-01 00:00:00.000 2022-07-01 00:00:00.000 2021 6 0
1 Location2 2021-02-01 00:00:00.000 2022-07-01 00:00:00.000 2021 7 0
1 Location2 2021-02-01 00:00:00.000 2022-07-01 00:00:00.000 2021 8 0
1 Location2 2021-02-01 00:00:00.000 2022-07-01 00:00:00.000 2021 9 0
1 Location2 2021-02-01 00:00:00.000 2022-07-01 00:00:00.000 2021 10 0
1 Location2 2021-02-01 00:00:00.000 2022-07-01 00:00:00.000 2021 11 0
1 Location2 2021-02-01 00:00:00.000 2022-07-01 00:00:00.000 2021 12 0
1 Location2 2021-02-01 00:00:00.000 2022-07-01 00:00:00.000 2022 1 31
1 Location2 2021-02-01 00:00:00.000 2022-07-01 00:00:00.000 2022 2 28
1 Location2 2021-02-01 00:00:00.000 2022-07-01 00:00:00.000 2022 3 31
1 Location2 2021-02-01 00:00:00.000 2022-07-01 00:00:00.000 2022 4 30
1 Location2 2021-02-01 00:00:00.000 2022-07-01 00:00:00.000 2022 5 11
1 Location2 2021-02-01 00:00:00.000 2022-07-01 00:00:00.000 2022 6 0
1 Location2 2021-02-01 00:00:00.000 2022-07-01 00:00:00.000 2022 7 0
2 Location1 2020-09-01 00:00:00.000 2020-12-01 00:00:00.000 2020 9 30
2 Location1 2020-09-01 00:00:00.000 2020-12-01 00:00:00.000 2020 10 31
2 Location1 2020-09-01 00:00:00.000 2020-12-01 00:00:00.000 2020 11 30
2 Location1 2020-09-01 00:00:00.000 2020-12-01 00:00:00.000 2020 12 1
2 Location3 2020-02-01 00:00:00.000 2020-03-01 00:00:00.000 2020 2 29
2 Location3 2020-02-01 00:00:00.000 2020-03-01 00:00:00.000 2020 3 1

If you can't/don't want to implement a calendar table or function, you could build a pseudo version using a CTE:

;WITH CalendarCTE AS (
SELECT MIN(StartDate) AS Date, MAX(EndDate) AS mxDate, DATEPART(MONTH,MIN(StartDate)) AS Month, DATEPART(YEAR,MIN(StartDate)) AS Year
FROM @Table
UNION ALL
SELECT DATEADD(DAY,1,Date), mxDate, DATEPART(MONTH,DATEADD(DAY,1,Date)) AS Month, DATEPART(YEAR,DATEADD(DAY,1,Date)) AS Year
  FROM CalendarCTE
 WHERE Date &lt; mxDate
)

SELECT t.GroupID, t.Location, t.StartDate, t.EndDate, c.Year, c.Month, COUNT(CASE WHEN DATEDIFF(DAY,(DATEADD(YEAR,DATEDIFF(YEAR,t.StartDate,c.Date),t.StartDate)),c.Date) &lt; 100 THEN 1 END) AS CountDays
  FROM @Table t
    INNER JOIN CalendarCTE c
	  ON t.StartDate &lt;= c.Date
	  AND t.EndDate &gt;= c.Date
 GROUP BY t.GroupID, t.Location, t.StartDate, t.EndDate, c.Year, c.Month
 ORDER BY GroupID, Location, c.Year, c.Month
   OPTION (MAXRECURSION 0)

答案2

得分: 1

如果您利用分析函数来计算累积天数,就不需要使用游标。

我的查询使用了 SQL Server 2022 的函数,通过 cross applygenerate_series() 将日期拆分为一系列的月份。您可以根据自己的喜好替换这部分内容。它还使用了 date_bucket() 函数来计算每个逻辑年的相对参考日期。如果需要的话,可以替换为等效的计算方法。其余部分只是简单的求和,然后使用一些 case 逻辑将天数限制在100天以内。

with explode as (
    select *,
        date_bucket(year, 1, Mnth,
            min(StartDate) over (partition by [Group])) as ReferenceDate,
        case when Mnth <> eomonth(EndDate) then day(Mnth) else 1 end as Counter
    from SampleData data cross apply (
        select eomonth(dateadd(month, value, StartDate)) as Mnth
        from generate_series(0, 99) -- 如果100个月不够,请调整...
        where dateadd(month, value, StartDate) <= EndDate
    ) months
), accumulate as (
    select *,
        sum(Counter) over (partition by [Group], ReferenceDate order by Mnth) as CumDays
    from explode
)
select [Group], Location, Amount,
    convert(char(6), Mnth, 112) as YearMonth,
    case when CumDays <= 100 then Counter
         when CumDays - Counter < 100 then 100 - (CumDays - Counter)
         else 0 end as CountDays
from accumulate
order by [Group], YearMonth, Location;

您可以在以下链接中查看示例和运行结果:

https://dbfiddle.uk/ZSYJpC5A

英文:

There's no need for cursors if you take advantage of analytic functions to compute the cumulative days numbers.

My query uses a SQL Server 2022 function to explode the dates into a series of months via cross apply and generate_series(). You can replace that with whatever method you prefer. It also uses date_bucket() to compute a relative reference date for each logical year. Substitute an equivalent calculation if necessary. The rest is just a straightforward summation followed by some case logic to cap at 100 days.

with explode as (
select *,
date_bucket(year, 1, Mnth,
min(StartDate) over (partition by [Group])) as ReferenceDate,
case when Mnth &lt;&gt; eomonth(EndDate) then day(Mnth) else 1 end as Counter
from SampleData data cross apply (
select eomonth(dateadd(month, value, StartDate)) as Mnth
from generate_series(0, 99) -- adjust if 100 months isn&#39;t enough...
where dateadd(month, value, StartDate) &lt;= EndDate
) months
), accumulate as (
select *,
sum(Counter) over (partition by [Group], ReferenceDate order by Mnth) as CumDays
from explode
)
select [Group], Location, Amount,
convert(char(6), Mnth, 112) as YearMonth,
case when CumDays &lt;= 100 then Counter
when CumDays - Counter &lt; 100 then 100 - (CumDays - Counter)
else 0 end as CountDays
from accumulate
order by [Group], YearMonth, Location;

https://dbfiddle.uk/ZSYJpC5A

huangapple
  • 本文由 发表于 2023年7月27日 18:11:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76778698.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定