英文:
Count distinct based on order of another column
问题
我有一个包含测量数据的表,它看起来像这样:
SELECT [Plant] --工厂名称
,[Machine] --机器名称
,[Material] --材料名称
,[Batch] --批次名称
,[DMC] --数据矩阵代码,每个部件都有一个
,[ProductionTime] --时间戳(UTC)
,[Result] --0/1表示NOK/OK
FROM [dbo].[Results]
每一行都代表一次测量,具有一个结果。每个被测试的零件都有一个DMC。由于零件可以被多次测试,因此在此表中会出现重复的DMC(具有不同的结果)。
示例数据:
工厂 | 机器 | 材料 | 批次 | DMC | 生产时间 | 结果 |
---|---|---|---|---|---|---|
A | 机器A | 材料A | X | ABC | 2023-02-16 16:21:52 | 1 |
A | 机器A | 材料A | X | DEF | 2023-02-16 16:21:30 | 1 |
A | 机器A | 材料A | X | DEF | 2023-02-16 16:21:09 | 0 |
A | 机器A | 材料A | Y | GHI | 2023-02-16 16:20:47 | 1 |
A | 机器B | 材料A | X | JKL | 2023-02-16 16:20:24 | 0 |
A | 机器B | 材料B | Y | MNO | 2023-02-16 16:20:03 | 1 |
要计算废品率,我需要计算OK和NOK零件的数量。
对于#OKParts,条件是:
-#OKParts:对于每个DMC,考虑其最新的生产时间,然后计算结果等于1的行数
(-#NOKParts相同,但结果等于0)
此外,为了计算FTT-Yield(“第一次通过”):
-#PartsFirstTestOK:对于每个DMC,考虑其最旧的生产时间,然后计算结果等于1的行数
(-#PartsFirstTestNOK:相同,但结果等于0)
由于这个表非常庞大,我需要通过筛选表中的数据(请参阅下面的where语句)来减少数据量,并对结果进行分组。
现在,我的查询如下所示:
SELECT [Plant]
,[Machine]
,[Material]
,[Batch]
,Count([Result]) as '#Tests' --所有行
,Count(Distinct [DMC]) as '#Parts' --所有不同的DMC
,COUNT(CASE when [Result] = 1 THEN 1 END) as '#OKTests'
,COUNT(CASE when [Result] = 0 THEN 1 END) as '#NOKTests'
FROM table
where [Plant] = 'A' and [ProductionTime] > DATEADD(DAY, -365,GETDATE())
group by [Plant],[Material],[Batch],[Machine]
(在这里,用Sum(Cast([Result] as INT))和Count([Result])-Sum(Cast([Result] as INT))来代替CASE函数也可以)
以获得(如现在)以下结果:
工厂 | 机器 | 材料 | 批次 | #Tests | #Parts | #OKTests | #NOKTests |
---|---|---|---|---|---|---|---|
A | 机器A | 材料A | X | 3 | 3 | 3 | 0 |
A | 机器A | 材料A | Y | 124 | 96 | 93 | 31 |
A | 机器B | 材料A | X | 11 | 9 | 9 | 2 |
A | 机器B | 材料B | Y | 21 | 13 | 11 | 10 |
我想要通过添加这些列的方式来获得这个表#OKParts,#NOKParts,#PartsFirstTestOK,#PartsFirstTestNOK。
我尝试创建一个子查询、使用FIRST_VALUE和OVER函数,但都没有成功。我在SQL方面是初学者。
有人有解决方法吗?非常感谢!
英文:
I have a table containing measurements, it looks like this:
SELECT [Plant] --plant name
,[Machine] --machine name
,[Material] --material name
,[Batch] --batch name
,[DMC] --data matrix code, each part has one
,[ProductionTime] --time stamp UTC
,[Result] --0/1 as NOK/OK
FROM [dbo].[Results]
Each row is one measurement, having a result. Each tested part has a DMC. Since parts can be tested several times, we have duplicated DMCs (with different results) in this table.
Example data:
Plant | Machine | Material | Batch | DMC | ProductionTime | Result |
---|---|---|---|---|---|---|
A | MachineA | MaterialA | X | ABC | 2023-02-16 16:21:52 | 1 |
A | MachineA | MaterialA | X | DEF | 2023-02-16 16:21:30 | 1 |
A | MachineA | MaterialA | X | DEF | 2023-02-16 16:21:09 | 0 |
A | MachineA | MaterialA | Y | GHI | 2023-02-16 16:20:47 | 1 |
A | MachineB | MaterialA | X | JKL | 2023-02-16 16:20:24 | 0 |
A | MachineB | MaterialB | Y | MNO | 2023-02-16 16:20:03 | 1 |
To calculate the scrap rate, I need to count the amount of OK and NOK parts.
For #OKParts, conditions are:
-#OKParts: for each DMC, consider its **latest **production time and calculate how many rows have result = 1
(-#NOKParts same but result = 0)
Additionally for calculating FTT-Yield ("first time through"):
-#PartsFirstTestOK: for each DMC, consider its oldest production time and calculate how many rows have result = 1
(-#PartsFirstTestNOK: same but result = 0)
Since this table is very big, I need to reduce the amount of data by filtering the table (see below the where statement) and group the results.
Right now, my query looks like this:
SELECT [Plant]
,[Machine]
,[Material]
,[Batch]
,Count([Result]) as '#Tests' --all rows
,Count(Distinct [DMC]) as '#Parts' --all different DMCs
,COUNT(CASE when [Result] = 1 THEN 1 END) as '#OKTests'
,COUNT(CASE when [Result] = 0 THEN 1 END) as '#NOKTests'
FROM table
where [Plant] = 'A' and [ProductionTime] > DATEADD(DAY, -365,GETDATE())
group by [Plant],[Material],[Batch],[Machine]
(Sum(Cast([Result] as INT)) and Count([Result])-Sum(Cast([Result] as INT)) instead of the CASE functions would also work here)
To get (as now) the following result:
Plant | Machine | Material | Batch | #Tests | #Parts | #OKTests | #NOKTests |
---|---|---|---|---|---|---|---|
A | MachineA | MaterialA | X | 3 | 3 | 3 | 0 |
A | MachineA | MaterialA | Y | 124 | 96 | 93 | 31 |
A | MachineB | MaterialA | X | 11 | 9 | 9 | 2 |
A | MachineB | MaterialB | Y | 21 | 13 | 11 | 10 |
and I want to add this table by columns #OKParts, #NOKParts, #PartsFirstTestOK, #PartsFirstTestNOK.
I've tried to create a sub-query, FIRST_VALUE & OVER function, but none of them worked for me. I am quite a starter in SQL.
Does anybody have an idea how to solve this?
Thanks a lot in advance!
答案1
得分: 1
以下是代码部分的翻译:
使用窗口化的ROW_NUMBER()函数为给定零件的测试系列分配升序和降序的序列号。检查其中一个值等于1可用于限制计数仅适用于第一个或最后一个测试。
SELECT Plant
,Machine
,Material
,Batch
,Count(Result) as [#Tests] --所有行
,Count(Distinct DMC) as [#Parts] --所有不同的DMC
,COUNT(CASE WHEN Result = 1 THEN 1 END) as [#OKTests]
,COUNT(CASE WHEN Result = 0 THEN 1 END) as [#NOKTests]
,COUNT(CASE when RowNumAsc = 1 AND Result = 1 THEN 1 END) as [#PartsFirstTestOK]
,COUNT(CASE when RowNumAsc = 1 AND Result = 0 THEN 1 END) as [#PartsFirstTestNOK]
,COUNT(CASE when RowNumDesc = 1 AND Result = 1 THEN 1 END) as [#OKParts]
,COUNT(CASE when RowNumDesc = 1 AND Result = 0 THEN 1 END) as [#NOKParts]
FROM (
SELECT *,
ROW_NUMBER() OVER(
PARTITION BY Plant, Material, Batch, Machine, DMC
ORDER BY ProductionTime
) AS RowNumAsc,
ROW_NUMBER() OVER(
PARTITION BY Plant, Material, Batch, Machine, DMC
ORDER BY ProductionTime DESC
) AS RowNumDesc
FROM table1
) T
where Plant = 'A' and ProductionTime > DATEADD(DAY, -365,GETDATE())
group by Plant, Material, Batch, Machine;
希望这对您有所帮助。
英文:
The following uses windowed ROW_NUMBER() functions to assign both ascending and descending sequence numbers to the series of tests for a given part. Checking for one or the other value = 1 can be used to limit your counts to just the first or last test.
SELECT Plant
,Machine
,Material
,Batch
,Count(Result) as [#Tests] --all rows
,Count(Distinct DMC) as [#Parts] --all different DMCs
,COUNT(CASE WHEN Result = 1 THEN 1 END) as [#OKTests]
,COUNT(CASE WHEN Result = 0 THEN 1 END) as [#NOKTests]
,COUNT(CASE when RowNumAsc = 1 AND Result = 1 THEN 1 END) as [#PartsFirstTestOK]
,COUNT(CASE when RowNumAsc = 1 AND Result = 0 THEN 1 END) as [#PartsFirstTestNOK]
,COUNT(CASE when RowNumDesc = 1 AND Result = 1 THEN 1 END) as [#OKParts]
,COUNT(CASE when RowNumDesc = 1 AND Result = 0 THEN 1 END) as [#NOKParts]
FROM (
SELECT *,
ROW_NUMBER() OVER(
PARTITION BY Plant, Material, Batch, Machine, DMC
ORDER BY ProductionTime
) AS RowNumAsc,
ROW_NUMBER() OVER(
PARTITION BY Plant, Material, Batch, Machine, DMC
ORDER BY ProductionTime DESC
) AS RowNumDesc
FROM table1
) T
where Plant = 'A' and ProductionTime > DATEADD(DAY, -365,GETDATE())
group by Plant, Material, Batch, Machine;
Sample results:
Plant | Machine | Material | Batch | #Tests | #Parts | #OKTests | #NOKTests | #PartsFirstTestOK | #PartsFirstTestNOK | #OKParts | #NOKParts |
---|---|---|---|---|---|---|---|---|---|---|---|
A | MachineA | MaterialA | X | 3 | 2 | 2 | 1 | 1 | 1 | 2 | 0 |
A | MachineB | MaterialA | X | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
A | MachineA | MaterialA | Y | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 |
A | MachineB | MaterialB | Y | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 |
See this db<>fiddle for a working demo.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论