如何将多行中的间隔组合?

huangapple go评论139阅读模式
英文:

How to combine intervals across multiple rows?

问题

我有一个表格,其中包含商品、库存状态(缺货或有货)以及商品的库存状态起始和结束日期范围。

SELECT
	[商品编号],
	[起始日期],
	[结束日期],
	[库存状态]
FROM 历史库存状态
ORDER BY [商品编号], [起始日期]

下面的表格显示了从上述查询中返回的数据。您会注意到相同库存状态的相同时间间隔可能有多行。例如,查看输出的前两行。该商品从2009年09月25日到2009年12月04日缺货,然后再次从2009年12月04日到2009年12月07日缺货。

我希望将连续日期的缺货状态间隔合并为单独的一行。使用上面的示例,前两行将合并为一行,起始日期为2009-09-25,结束日期为2009-12-07。

以下是期望的输出表格,只包括商品缺货时段的单行记录。不需要显示有货时段。

| 商品编号 | 起始日期 | 结束日期 | 库存状态    |
| --------- | -------- | ------- | ------------ |
| 商品 A     | 2009-09-25 | 2009-12-07 | 缺货 |
| 商品 A     | 2010-07-27 | 2010-11-16 | 缺货 |
| 商品 A     | 2011-07-01 | 2011-08-03 | 缺货 |
| 商品 A     | 2011-08-26 | 2011-10-10 | 缺货 |
| 商品 A     | 2011-11-23 | 2011-12-21 | 缺货 |
| 商品 A     | 2017-11-06 | 2017-12-28 | 缺货 |
| 商品 A     | 2018-01-15 | 2018-02-01 | 缺货 |
| 商品 A     | 2019-03-19 | 2019-03-28 | 缺货 |
| 商品 A     | 2021-03-30 | 2021-07-16 | 缺货 |

我认为这是一个“间隙和岛屿”问题,但我在寻找可以应用于这个问题的资源方面遇到了困难。我相信可以使用窗口函数解决这个问题,但我也在思考如何将它们应用于此查询。

我的思路是使用排名函数添加一个列,用于将行分区成块,然后使用MIN([起始日期]) OVER(PARTITION BY [分区块])来查找起始日期,并使用MAX([结束日期]) OVER(PARTITION BY [分区块])来查找结束日期。

我设想的中间结果集如下表所示,但我需要帮助生成“分区块”列。

| 商品编号 | 缺货日期  | 有货日期  | 库存状态    | 分区块  |
| --------- | -------- | ------- | ------------ | ---------- |
| 商品 A     | 2009-09-25 | 2009-12-04 | 缺货 | 1 |
| 商品 A     | 2009-12-04 | 2009-12-07 | 缺货 | 1 |
| 商品 A     | 2009-12-07 | 2010-07-27 | 有货 | 0 |
| 商品 A     | 2010-07-27 | 2010-07-27 | 缺货 | 2 |
| 商品 A     | 2010-07-27 | 2010-08-05 | 缺货 | 2 |
| 商品 A     | 2010-08-05 | 2010-10-07 | 缺货 | 2 |
| 商品 A     | 2010-10-07 | 2010-11-16 | 缺货 | 2 |
| 商品 A     | 2010-11-16 | 2011-07-01 | 有货 | 0 |
| 商品 A     | 2011-07-01 | 2011-07-13 | 缺货 | 3 |
| 商品 A     | 2011-07-13 | 2011-08-03 | 缺货 | 3 |
| 商品 A     | 2011-08-03 | 2011-08-26 | 有货 | 0 |
| 商品 A     | 2011-08-26 | 2011-08-29 | 缺货 | 4 |
| 商品 A     | 2011-08-29 | 2011-10-10 | 缺货 | 4 |
| 商品 A     | 2011-10-10 | 2011-11-23 | 有货 | 0 |
| 商品 A     | 2011-11-23 | 2011-11-29 | 缺货 | 5 |
| 商品 A     | 2011-11-29 | 2011-11-29 | 缺货 | 5 |
| 商品 A     | 2011-11-29 | 2011-12-21 | 缺货 | 5 |
| 商品 A     | 2011-12-21 | 2017-11-06 | 有货 | 0 |
| 商品 A     | 2017-11-06 | 2017-12-28 | 缺货 | 6 |
| 商品 A     | 2017-12-28 | 2018-01-15 | 有货 | 0 |
| 商品 A     | 2018-01-15 | 2018-02-01 | 缺货 | 7 |
| 商品 A     | 2018-02-01 | 2019-03-19 | 有货 | 0 |
| 商品 A     | 2019-03-19 | 2019-03-28 | 缺货 | 8 |
| 商品 A     | 2019-03-28 | 2021-03-30 | 有货 | 0 |
| 商品 A     | 2021-03-30 | 2021-07-16 | 缺货 | 9 |
| 商品 A     | 2021-07-16 | NULL | 有货 | 0 |

我愿意接受任何帮助或建议,谢谢!

英文:

I have a table that has items, stock status (out of stock or in stock), and the starting and ending date range for the stocking status of the item.

SELECT
	[Item Number],
	[Start Date],
	[End Date],
	[Stock Status]
FROM HistoricalStockStatus
ORDER BY [Item Number], [Start Date]

The table below shows the data returned from the above query. You'll notice that there can be multiple rows for the same interval for the same stocking status. For example, look at the first two rows of the output. The item was out of stock from 2009-09-25 to 2009-12-04 and listed out of stock again from 2009-12-04 to 2009-12-07.

I would like to combine the out of stock status intervals to a single row for the intervals that have consecutive dates. Using the above example the first two rows would be combined into a single row with the start date of 2009-09-25 and end date of 2009-12-07.

Item Number Start Date End Date Stock Status
Item A 2009-09-25 2009-12-04 Out of Stock
Item A 2009-12-04 2009-12-07 Out of Stock
Item A 2009-12-07 2010-07-27 In Stock
Item A 2010-07-27 2010-07-27 Out of Stock
Item A 2010-07-27 2010-08-05 Out of Stock
Item A 2010-08-05 2010-10-07 Out of Stock
Item A 2010-10-07 2010-11-16 Out of Stock
Item A 2010-11-16 2011-07-01 In Stock
Item A 2011-07-01 2011-07-13 Out of Stock
Item A 2011-07-13 2011-08-03 Out of Stock
Item A 2011-08-03 2011-08-26 In Stock
Item A 2011-08-26 2011-08-29 Out of Stock
Item A 2011-08-29 2011-10-10 Out of Stock
Item A 2011-10-10 2011-11-23 In Stock
Item A 2011-11-23 2011-11-29 Out of Stock
Item A 2011-11-29 2011-11-29 Out of Stock
Item A 2011-11-29 2011-12-21 Out of Stock
Item A 2011-12-21 2017-11-06 In Stock
Item A 2017-11-06 2017-12-28 Out of Stock
Item A 2017-12-28 2018-01-15 In Stock
Item A 2018-01-15 2018-02-01 Out of Stock
Item A 2018-02-01 2019-03-19 In Stock
Item A 2019-03-19 2019-03-28 Out of Stock
Item A 2019-03-28 2021-03-30 In Stock
Item A 2021-03-30 2021-07-16 Out of Stock
Item A 2021-07-16 NULL In Stock

The desired output (the table below) is to have a single row for each interval of time that the item was out of stock. I don't need to have the intervals that an item was in stock.

Item Number Start Date End Date Stock Status
Item A 2009-09-25 2009-12-07 Out of Stock
Item A 2010-07-27 2010-11-16 Out of Stock
Item A 2011-07-01 2011-08-03 Out of Stock
Item A 2011-08-26 2011-10-10 Out of Stock
Item A 2011-11-23 2011-12-21 Out of Stock
Item A 2017-11-06 2017-12-28 Out of Stock
Item A 2018-01-15 2018-02-01 Out of Stock
Item A 2019-03-19 2019-03-28 Out of Stock
Item A 2021-03-30 2021-07-16 Out of Stock

I believe this problem is a gaps and islands problem but I'm having trouble finding resources that I can apply to this problem. I believe it can be solved with window functions but I'm also having trouble figuring out how to apply them to this query.

My thought process is to use a ranking function to add a column that can be used to partition the rows into blocks and then use MIN([Start Date]) OVER(PARTITION BY [Partition Block]) to find the start date and use MAX([End Date]) OVER(PARTITION BY [Partition Block]) to find the end date.

Something like this table below is what I envision as the intermediate result set but I need help generating the Partition Block column.

Item Number Out Date In Date Stock Status Partition Block
Item A 2009-09-25 2009-12-04 Out of Stock 1
Item A 2009-12-04 2009-12-07 Out of Stock 1
Item A 2009-12-07 2010-07-27 In Stock 0
Item A 2010-07-27 2010-07-27 Out of Stock 2
Item A 2010-07-27 2010-08-05 Out of Stock 2
Item A 2010-08-05 2010-10-07 Out of Stock 2
Item A 2010-10-07 2010-11-16 Out of Stock 2
Item A 2010-11-16 2011-07-01 In Stock 0
Item A 2011-07-01 2011-07-13 Out of Stock 3
Item A 2011-07-13 2011-08-03 Out of Stock 3
Item A 2011-08-03 2011-08-26 In Stock 0
Item A 2011-08-26 2011-08-29 Out of Stock 4
Item A 2011-08-29 2011-10-10 Out of Stock 4
Item A 2011-10-10 2011-11-23 In Stock 0
Item A 2011-11-23 2011-11-29 Out of Stock 5
Item A 2011-11-29 2011-11-29 Out of Stock 5
Item A 2011-11-29 2011-12-21 Out of Stock 5
Item A 2011-12-21 2017-11-06 In Stock 0
Item A 2017-11-06 2017-12-28 Out of Stock 6
Item A 2017-12-28 2018-01-15 In Stock 0
Item A 2018-01-15 2018-02-01 Out of Stock 7
Item A 2018-02-01 2019-03-19 In Stock 0
Item A 2019-03-19 2019-03-28 Out of Stock 8
Item A 2019-03-28 2021-03-30 In Stock 0
Item A 2021-03-30 2021-07-16 Out of Stock 9
Item A 2021-07-16 NULL In Stock 0

I'm open to any help or suggestions, thanks!

答案1

得分: 1

以下是代码的翻译部分:

One way is to virtualize groupings based on a change in status. Then use a SUM with the UNBOUNDED PRECEDING to group markers into consecutive groups.

SELECT
--聚合结果
ItemName AS [商品名称],
MIN(StartDate) [入库日期],
MAX(EndDate) [出库日期],
MAX(StockStatus) AS [库存状态],
VirtualGroupID AS [分区块]
FROM
(
SELECT
*,
--使用SUM与UNBOUNDED PRECEDING将所有IsNewGroup标记汇总为连续的排序分组。
--在1的转变之间的所有0将根据分区应用之前的1的运行总和。
SUM(IsNewGroup) OVER (PARTITION BY ItemName ORDER BY StartDate ROWS UNBOUNDED PRECEDING) AS VirtualGroupID
FROM
(
SELECT
ItemName, StartDate, EndDate, StockStatus,
--如果下一个项目有新的StockStatus,这将是1或0 - 1将被总结并序列化为岛屿组,以便可以对每个应用MAX和MIN
CASE WHEN ISNULL(LAG(StockStatus) OVER (PARTITION BY ItemName ORDER BY StartDate), StockStatus) <> StockStatus THEN 1 ELSE 0 END AS IsNewGroup
FROM
HistoricalStockStatus
) AS X
WHERE
StockStatus = 'Out of Stock'
) AS Y
GROUP BY
ItemName, VirtualGroupID
ORDER BY
ItemName, MIN(StartDate)

Edit: 下面的查询需要对数据进行少一次遍历。当然,未经测试。

SELECT
--聚合结果
ItemName AS [商品名称],
MIN(StartDate) [入库日期],
MAX(EndDate) [出库日期],
MAX(StockStatus) AS [库存状态],
VirtualGroupID AS [分区块]
FROM
(
SELECT
ItemName, StartDate, EndDate, StockStatus,
SUM(CASE WHEN StockStatus = 'Out of Stock' THEN 0 ELSE 1 END) OVER (PARTITION BY ItemName ORDER BY StartDate ROWS UNBOUNDED PRECEDING) AS VirtualGroupID
FROM
HistoricalStockStatus
) AS X
WHERE
StockStatus = 'Out of Stock'
GROUP BY
ItemName, VirtualGroupID
ORDER BY
ItemName, MIN(StartDate)

英文:

One way is to virtualize groupings based on a change in status. Then use a SUM with the UNBOUNDED PRECEDING to group markers into consecutive groups.

SELECT
	--Aggregate results
	ItemName AS [Item Name],
	MIN(StartDate) [In Date],
	MAX(EndDate) [Out Date],
	MAX(StockStatus) AS [Stock Status],
    VirtualGroupID AS [Partition Block]   
FROM
(
	SELECT
		*,
		--Use SUM with the UNBOUNDED PRECEDING to gather all IsNewGroup Markers into consecutivly ordered groupings. 
        --All 0&#39;s between transitions to 1&#39;s will have a running sum of preceding 1&#39;s applied per partition.
		SUM(IsNewGroup) OVER (PARTITION BY ItemName ORDER BY StartDate ROWS UNBOUNDED PRECEDING) AS VirtualGroupID
	FROM
	(
		SELECT
			ItemName, StartDate, EndDate,StockStatus,
			--This will either be a 1 or 0 if the next item has a new StockStatus - The 1&#39;s will be summed above and serialized into island grouos so MAX AND MIN can be applied to each
			CASE WHEN ISNULL(LAG(StockStatus) OVER (PARTITION BY ItemName ORDER BY StartDate),StockStatus)&lt;&gt;StockStatus THEN 1 ELSE 0 END AS IsNewGroup
		FROM
		   HistoricalStockStatus
	)AS X
	WHERE
		StockStatus = &#39;Out of Stock&#39;
)AS Y
GROUP BY 
	ItemName,VirtualGroupID
ORDER BY
	ItemName, MIN(StartDate)

Edit : The query below requires one less pass over the data. untested of course.

SELECT	
	--Aggregate results
    ItemName AS [Item Name],
    MIN(StartDate) [In Date],
    MAX(EndDate) [Out Date],
    MAX(StockStatus) AS [Stock Status],
    VirtualGroupID AS [Partition Block]
FROM
(
	SELECT
		ItemName, StartDate, EndDate,StockStatus,	
		SUM(CASE WHEN StockStatus=&#39;Out of Stock&#39; THEN 0 ELSE 1 END) OVER (PARTITION BY ItemName ORDER BY StartDate ROWS UNBOUNDED PRECEDING) AS VirtualGroupID
	FROM
		HistoricalStockStatus
)AS X
WHERE
	 StockStatus=&#39;Out of Stock&#39;
GROUP BY 
    ItemName,VirtualGroupID
ORDER BY
    ItemName, MIN(StartDate)

huangapple
  • 本文由 发表于 2023年3月7日 22:54:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/75663573.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定