英文:
Query to find problems with missing row pairs, given a partition?
问题
我有一个表格,其中信息被存储如下:
工作订单 | 员工 | 功能组 | 功能类型 | 时间戳 |
---|---|---|---|---|
WO1 | Emp1 | Group1 | Start | 7/27/23 09:00 |
WO1 | Emp1 | Group1 | Stop | 7/27/23 10:00 |
WO1 | Emp1 | Group1 | Start | 7/27/23 11:00 |
WO1 | Emp1 | Group1 | Stop | 7/27/23 12:00 |
WO2 | Emp2 | Group2 | Start | 7/27/23 13:00 |
WO2 | Emp2 | Group2 | Stop | 7/27/23 14:00 |
WO2 | Emp2 | Group2 | Start | 7/27/23 15:00 |
WO2 | Emp2 | Group2 | Stop | 7/27/23 16:00 |
WO3 | Emp3 | Group3 | Start | 7/27/23 17:00 (问题在这里:因为下一行也是一个Start,所以这一行应该返回) |
WO3 | Emp3 | Group3 | Start | 7/27/23 18:00 |
WO3 | Emp3 | Group3 | Start | 7/27/23 19:00 |
WO3 | Emp3 | Group3 | Stop | 7/27/23 20:00 |
WO4 | Emp4 | Group4 | Stop | 7/27/23 17:00 (问题在这里:因为这个分区的数据集以一个Stop而不是一个Start开始,所以这一行应该返回) |
WO4 | Emp4 | Group4 | Start | 7/27/23 18:00 |
WO4 | Emp4 | Group4 | Start | 7/27/23 19:00 |
WO4 | Emp4 | Group4 | Stop | 7/27/23 20:00 |
对于每个工作订单、员工和功能组,员工可以插入一对Start和Stop行(这基本上是如何定义分区的)。它们必须始终以Start然后Stop的顺序出现,不能反过来。这个数据收集将从特定的日期/时间开始,所以有一个干净的点,所有的数据必须从Start开始。他们可以根据需要插入这些对。我需要编写一个查询,检查是否存在与这些配对有关的问题,如果有问题,返回首次出现此问题的行。
在上面的表中,最后两个部分显示了一个潜在的问题和必须返回的行。这里的主要挑战只是找出哪些行应该成对出现。一个替代方法是考虑只需在每个Start时插入唯一的ID,然后在每个Stop时插入相同的ID。这可能有效,但现在我需要一个查询,可以显示我正在使用的测试数据中的问题。
英文:
I have a table where information is being stored like so:
Work Order | Employee | FunctionGroup | FunctionType | Timestamp |
---|---|---|---|---|
WO1 | Emp1 | Group1 | Start | 7/27/23 09:00 |
WO1 | Emp1 | Group1 | Stop | 7/27/23 10:00 |
WO1 | Emp1 | Group1 | Start | 7/27/23 11:00 |
WO1 | Emp1 | Group1 | Stop | 7/27/23 12:00 |
WO2 | Emp2 | Group2 | Start | 7/27/23 13:00 |
WO2 | Emp2 | Group2 | Stop | 7/27/23 14:00 |
WO2 | Emp2 | Group2 | Start | 7/27/23 15:00 |
WO2 | Emp2 | Group2 | Stop | 7/27/23 16:00 |
WO3 | Emp3 | Group3 | Start | 7/27/23 17:00 (problem here: since the next row is also a Start, then this row should be returned) |
WO3 | Emp3 | Group3 | Start | 7/27/23 18:00 |
WO3 | Emp3 | Group3 | Start | 7/27/23 19:00 |
WO3 | Emp3 | Group3 | Stop | 7/27/23 20:00 |
WO4 | Emp4 | Group4 | Stop | 7/27/23 17:00 (problem here: since the dataset for this partition starts with a Stop instead of a Start, then this row should be returned) |
WO4 | Emp4 | Group4 | Start | 7/27/23 18:00 |
WO4 | Emp4 | Group4 | Start | 7/27/23 19:00 |
WO4 | Emp4 | Group4 | Stop | 7/27/23 20:00 |
For each work order, employee, and function group, an employee can insert a Start and a Stop row (this is basically how the partition is defined). They must always be in the order of Start then Stop and cannot be backwards. This data collection will all start on a specific date/time, so there is a clean point where all the data must come in beginning with a Start. They can insert these pairs as many times as the need to. I need to write a query that checks to see if there is a problem with the pairings and if so - return the first row where this problem appeared.
In the table above, the last 2 sections show a potential problem and the row that must be returned. The main challenge here is just figuring out how to determine which rows go together as a pair. One alternative way I am thinking of handling this is to simply insert a unique ID with every Start, then insert that same ID with every Stop. This might work, but for now I need a query that can show me problems for the test data I'm using.
答案1
得分: 3
根据数据应该以开始/停止配对的期望,您可以分配一个行号,然后将期望的FunctionType与实际的FunctionType进行比较,或者更确切地说,因为您希望错误出现在前一行,所以将期望的FunctionType与下一个 FunctionType(使用LEAD
)进行比较。
declare @TestData table (WorkOrder varchar(3), Employee varchar(4), FunctionGroup varchar(6), FunctionType varchar(5), [Timestamp] datetime)
insert into @TestData (WorkOrder, Employee, FunctionGroup, FunctionType, [Timestamp])
values
('WO1','Emp1','Group1','Start','7/27/23 09:00'),
('WO1','Emp1','Group1','Stop','7/27/23 10:00'),
('WO1','Emp1','Group1','Start','7/27/23 11:00'),
('WO1','Emp1','Group1','Stop','7/27/23 12:00'),
('WO2','Emp2','Group2','Start','7/27/23 13:00'),
('WO2','Emp2','Group2','Stop','7/27/23 14:00'),
('WO2','Emp2','Group2','Start','7/27/23 15:00'),
('WO2','Emp2','Group2','Stop','7/27/23 16:00'),
('WO3','Emp3','Group3','Start','7/27/23 17:00'),-- (问题在这里: 由于下一行也是Start,所以应该返回此行)
('WO3','Emp3','Group3','Start','7/27/23 18:00'),
('WO3','Emp3','Group3','Start','7/27/23 19:00'),
('WO3','Emp3','Group3','Stop','7/27/23 20:00'),
('WO4','Emp4','Group4','Stop','7/27/23 17:00'),-- (问题在这里: 由于该分区的数据集以Stop而不是Start开头,所以应该返回此行)
('WO4','Emp4','Group4','Start','7/27/23 18:00'),
('WO4','Emp4','Group4','Start','7/27/23 19:00'),
('WO4','Emp4','Group4','Stop','7/27/23 20:00');
with cte as (
select *
, row_number() over (partition by WorkOrder, Employee, FunctionGroup order by [Timestamp]) rn
, lead(FunctionType) over (partition by WorkOrder, Employee, FunctionGroup order by [Timestamp]) FunctionTypeLead
from @TestData
)
select WorkOrder, Employee, FunctionGroup, FunctionType, [Timestamp]
from cte
where rn%2 = 1 and FunctionTypeLead != 'Stop'
order by WorkOrder, Employee, FunctionGroup, [Timestamp];
结果:
WorkOrder | Employee | FunctionGroup | FunctionType | Timestamp |
---|---|---|---|---|
WO3 | Emp3 | Group3 | Start | 2023-07-27 17:00:00.000 |
WO4 | Emp4 | Group4 | Stop | 2023-07-27 17:00:00.000 |
注意:提供DDL+DML(如上所示)可以更容易回答问题。
英文:
So based on the expectation that the data should be in start/stop pairs, you can allocate a row number and then compare the expected FunctionType with the actual FunctionType, or rather, since you want the error to appear on the line before, the next FunctionType (using LEAD
) with the expected FunctionType.
declare @TestData table (WorkOrder varchar(3), Employee varchar(4), FunctionGroup varchar(6), FunctionType varchar(5), [Timestamp] datetime)
insert into @TestData (WorkOrder, Employee, FunctionGroup, FunctionType, [Timestamp])
values
('WO1','Emp1','Group1','Start','7/27/23 09:00'),
('WO1','Emp1','Group1','Stop','7/27/23 10:00'),
('WO1','Emp1','Group1','Start','7/27/23 11:00'),
('WO1','Emp1','Group1','Stop','7/27/23 12:00'),
('WO2','Emp2','Group2','Start','7/27/23 13:00'),
('WO2','Emp2','Group2','Stop','7/27/23 14:00'),
('WO2','Emp2','Group2','Start','7/27/23 15:00'),
('WO2','Emp2','Group2','Stop','7/27/23 16:00'),
('WO3','Emp3','Group3','Start','7/27/23 17:00'),-- (problem here: since the next row is also a Start, then this row should be returned)
('WO3','Emp3','Group3','Start','7/27/23 18:00'),
('WO3','Emp3','Group3','Start','7/27/23 19:00'),
('WO3','Emp3','Group3','Stop','7/27/23 20:00'),
('WO4','Emp4','Group4','Stop','7/27/23 17:00'),-- (problem here: since the dataset for this partition starts with a Stop instead of a Start, then this row should be returned)
('WO4','Emp4','Group4','Start','7/27/23 18:00'),
('WO4','Emp4','Group4','Start','7/27/23 19:00'),
('WO4','Emp4','Group4','Stop','7/27/23 20:00');
with cte as (
select *
, row_number() over (partition by WorkOrder, Employee, FunctionGroup order by [Timestamp]) rn
, lead(FunctionType) over (partition by WorkOrder, Employee, FunctionGroup order by [Timestamp]) FunctionTypeLead
from @TestData
)
select WorkOrder, Employee, FunctionGroup, FunctionType, [Timestamp]
from cte
where rn%2 = 1 and FunctionTypeLead != 'Stop'
order by WorkOrder, Employee, FunctionGroup, [Timestamp];
Returns:
WorkOrder | Employee | FunctionGroup | FunctionType | Timestamp |
---|---|---|---|---|
WO3 | Emp3 | Group3 | Start | 2023-07-27 17:00:00.000 |
WO4 | Emp4 | Group4 | Stop | 2023-07-27 17:00:00.000 |
Note: Providing the DDL+DML (as shown here) makes it much easier to answer.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论