查询以查找在给定分区中缺少行对的问题?

huangapple go评论96阅读模式
英文:

Query to find problems with missing row pairs, given a partition?

问题

我有一个表格,其中信息被存储如下:

工作订单 员工 功能组 功能类型 时间戳
WO1 Emp1 Group1 Start 7/27/23 09:00
WO1 Emp1 Group1 Stop 7/27/23 10:00
WO1 Emp1 Group1 Start 7/27/23 11:00
WO1 Emp1 Group1 Stop 7/27/23 12:00
WO2 Emp2 Group2 Start 7/27/23 13:00
WO2 Emp2 Group2 Stop 7/27/23 14:00
WO2 Emp2 Group2 Start 7/27/23 15:00
WO2 Emp2 Group2 Stop 7/27/23 16:00
WO3 Emp3 Group3 Start 7/27/23 17:00 (问题在这里:因为下一行也是一个Start,所以这一行应该返回)
WO3 Emp3 Group3 Start 7/27/23 18:00
WO3 Emp3 Group3 Start 7/27/23 19:00
WO3 Emp3 Group3 Stop 7/27/23 20:00
WO4 Emp4 Group4 Stop 7/27/23 17:00 (问题在这里:因为这个分区的数据集以一个Stop而不是一个Start开始,所以这一行应该返回)
WO4 Emp4 Group4 Start 7/27/23 18:00
WO4 Emp4 Group4 Start 7/27/23 19:00
WO4 Emp4 Group4 Stop 7/27/23 20:00

对于每个工作订单、员工和功能组,员工可以插入一对Start和Stop行(这基本上是如何定义分区的)。它们必须始终以Start然后Stop的顺序出现,不能反过来。这个数据收集将从特定的日期/时间开始,所以有一个干净的点,所有的数据必须从Start开始。他们可以根据需要插入这些对。我需要编写一个查询,检查是否存在与这些配对有关的问题,如果有问题,返回首次出现此问题的行。

在上面的表中,最后两个部分显示了一个潜在的问题和必须返回的行。这里的主要挑战只是找出哪些行应该成对出现。一个替代方法是考虑只需在每个Start时插入唯一的ID,然后在每个Stop时插入相同的ID。这可能有效,但现在我需要一个查询,可以显示我正在使用的测试数据中的问题。

英文:

I have a table where information is being stored like so:

Work Order Employee FunctionGroup FunctionType Timestamp
WO1 Emp1 Group1 Start 7/27/23 09:00
WO1 Emp1 Group1 Stop 7/27/23 10:00
WO1 Emp1 Group1 Start 7/27/23 11:00
WO1 Emp1 Group1 Stop 7/27/23 12:00
WO2 Emp2 Group2 Start 7/27/23 13:00
WO2 Emp2 Group2 Stop 7/27/23 14:00
WO2 Emp2 Group2 Start 7/27/23 15:00
WO2 Emp2 Group2 Stop 7/27/23 16:00
WO3 Emp3 Group3 Start 7/27/23 17:00 (problem here: since the next row is also a Start, then this row should be returned)
WO3 Emp3 Group3 Start 7/27/23 18:00
WO3 Emp3 Group3 Start 7/27/23 19:00
WO3 Emp3 Group3 Stop 7/27/23 20:00
WO4 Emp4 Group4 Stop 7/27/23 17:00 (problem here: since the dataset for this partition starts with a Stop instead of a Start, then this row should be returned)
WO4 Emp4 Group4 Start 7/27/23 18:00
WO4 Emp4 Group4 Start 7/27/23 19:00
WO4 Emp4 Group4 Stop 7/27/23 20:00

For each work order, employee, and function group, an employee can insert a Start and a Stop row (this is basically how the partition is defined). They must always be in the order of Start then Stop and cannot be backwards. This data collection will all start on a specific date/time, so there is a clean point where all the data must come in beginning with a Start. They can insert these pairs as many times as the need to. I need to write a query that checks to see if there is a problem with the pairings and if so - return the first row where this problem appeared.

In the table above, the last 2 sections show a potential problem and the row that must be returned. The main challenge here is just figuring out how to determine which rows go together as a pair. One alternative way I am thinking of handling this is to simply insert a unique ID with every Start, then insert that same ID with every Stop. This might work, but for now I need a query that can show me problems for the test data I'm using.

答案1

得分: 3

根据数据应该以开始/停止配对的期望,您可以分配一个行号,然后将期望的FunctionType与实际的FunctionType进行比较,或者更确切地说,因为您希望错误出现在前一行,所以将期望的FunctionType与下一个 FunctionType(使用LEAD)进行比较。

declare @TestData table (WorkOrder varchar(3), Employee varchar(4), FunctionGroup varchar(6), FunctionType varchar(5), [Timestamp] datetime)

insert into @TestData (WorkOrder, Employee, FunctionGroup, FunctionType, [Timestamp])
values
('WO1','Emp1','Group1','Start','7/27/23 09:00'),
('WO1','Emp1','Group1','Stop','7/27/23 10:00'),
('WO1','Emp1','Group1','Start','7/27/23 11:00'),
('WO1','Emp1','Group1','Stop','7/27/23 12:00'),
('WO2','Emp2','Group2','Start','7/27/23 13:00'),
('WO2','Emp2','Group2','Stop','7/27/23 14:00'),
('WO2','Emp2','Group2','Start','7/27/23 15:00'),
('WO2','Emp2','Group2','Stop','7/27/23 16:00'),
('WO3','Emp3','Group3','Start','7/27/23 17:00'),-- (问题在这里: 由于下一行也是Start,所以应该返回此行)
('WO3','Emp3','Group3','Start','7/27/23 18:00'),
('WO3','Emp3','Group3','Start','7/27/23 19:00'),
('WO3','Emp3','Group3','Stop','7/27/23 20:00'),
('WO4','Emp4','Group4','Stop','7/27/23 17:00'),-- (问题在这里: 由于该分区的数据集以Stop而不是Start开头,所以应该返回此行)
('WO4','Emp4','Group4','Start','7/27/23 18:00'),
('WO4','Emp4','Group4','Start','7/27/23 19:00'),
('WO4','Emp4','Group4','Stop','7/27/23 20:00');

with cte as (
  select *
    , row_number() over (partition by WorkOrder, Employee, FunctionGroup order by [Timestamp]) rn
    , lead(FunctionType) over (partition by WorkOrder, Employee, FunctionGroup order by [Timestamp]) FunctionTypeLead
  from @TestData
)
select WorkOrder, Employee, FunctionGroup, FunctionType, [Timestamp]
from cte
where rn%2 = 1 and FunctionTypeLead != 'Stop'
order by WorkOrder, Employee, FunctionGroup, [Timestamp];

结果:

WorkOrder Employee FunctionGroup FunctionType Timestamp
WO3 Emp3 Group3 Start 2023-07-27 17:00:00.000
WO4 Emp4 Group4 Stop 2023-07-27 17:00:00.000

注意:提供DDL+DML(如上所示)可以更容易回答问题。

英文:

So based on the expectation that the data should be in start/stop pairs, you can allocate a row number and then compare the expected FunctionType with the actual FunctionType, or rather, since you want the error to appear on the line before, the next FunctionType (using LEAD) with the expected FunctionType.

declare @TestData table (WorkOrder varchar(3), Employee varchar(4), FunctionGroup varchar(6), FunctionType varchar(5), [Timestamp] datetime)

insert into @TestData (WorkOrder, Employee, FunctionGroup, FunctionType, [Timestamp])
values
('WO1','Emp1','Group1','Start','7/27/23 09:00'),
('WO1','Emp1','Group1','Stop','7/27/23 10:00'),
('WO1','Emp1','Group1','Start','7/27/23 11:00'),
('WO1','Emp1','Group1','Stop','7/27/23 12:00'),
('WO2','Emp2','Group2','Start','7/27/23 13:00'),
('WO2','Emp2','Group2','Stop','7/27/23 14:00'),
('WO2','Emp2','Group2','Start','7/27/23 15:00'),
('WO2','Emp2','Group2','Stop','7/27/23 16:00'),
('WO3','Emp3','Group3','Start','7/27/23 17:00'),-- (problem here: since the next row is also a Start, then this row should be returned)
('WO3','Emp3','Group3','Start','7/27/23 18:00'),
('WO3','Emp3','Group3','Start','7/27/23 19:00'),
('WO3','Emp3','Group3','Stop','7/27/23 20:00'),
('WO4','Emp4','Group4','Stop','7/27/23 17:00'),-- (problem here: since the dataset for this partition starts with a Stop instead of a Start, then this row should be returned)
('WO4','Emp4','Group4','Start','7/27/23 18:00'),
('WO4','Emp4','Group4','Start','7/27/23 19:00'),
('WO4','Emp4','Group4','Stop','7/27/23 20:00');

with cte as (
  select *
    , row_number() over (partition by WorkOrder, Employee, FunctionGroup order by [Timestamp]) rn
    , lead(FunctionType) over (partition by WorkOrder, Employee, FunctionGroup order by [Timestamp]) FunctionTypeLead
  from @TestData
)
select WorkOrder, Employee, FunctionGroup, FunctionType, [Timestamp]
from cte
where rn%2 = 1 and FunctionTypeLead != 'Stop'
order by WorkOrder, Employee, FunctionGroup, [Timestamp];

Returns:

WorkOrder Employee FunctionGroup FunctionType Timestamp
WO3 Emp3 Group3 Start 2023-07-27 17:00:00.000
WO4 Emp4 Group4 Stop 2023-07-27 17:00:00.000

Note: Providing the DDL+DML (as shown here) makes it much easier to answer.

huangapple
  • 本文由 发表于 2023年7月28日 06:07:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76783696.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定