返回相同表的多个连接中的1行 – SQL Server

huangapple go评论71阅读模式
英文:

Return 1 row from multiple joins to same table - SQL Server

问题

我有这些表格:

CREATE TABLE staff
(
    [case_id] int, 
    [staff] varchar(11), 
    [stafftype] varchar(10)
);

INSERT INTO staff
    ([case_id], [staff], [stafftype])
VALUES
    (1, 'Daffy, Duck', 'Primary'),
    (1, 'Bugs, Bunny', 'Additional'),
    (1, 'Elmer, Fudd', 'Additional'),
    (2, 'Daffy, Duck', 'Primary'),
    (2, 'Bugs, Bunny', 'Additional');

CREATE TABLE cases
(
    [case_id] int, 
    [casedate] datetime, 
    [caselocation] varchar(4)
);

INSERT INTO cases
    ([case_id], [casedate], [caselocation])
VALUES
    (1, '2023-01-01 00:00:00', 'Home'),
    (2, '2023-01-03 00:00:00', 'Away');

我想要为每个案例返回一行,每个案例只能有1个 Primary stafftype 和最多2个 Additional stafftypes。

例如,对于 case_id = 1 的结果集:

case_id caselocation PrimaryStaff AdditionalStaff1 AdditionalStaff2
1 Home Daffy, Duck Bugs, Bunny Elmer, Fudd

SQL Fiddle 演示: http://sqlfiddle.com/#!18/83bbc9/6

英文:

I have these tables:

CREATE TABLE staff
(
    [case_id] int, 
    [staff] varchar(11), 
    [stafftype] varchar(10)
);
    
INSERT INTO staff
    ([case_id], [staff], [stafftype])
VALUES
    (1, 'Daffy, Duck', 'Primary'),
    (1, 'Bugs, Bunny', 'Additional'),
    (1, 'Elmer, Fudd', 'Additional'),
    (2, 'Daffy, Duck', 'Primary'),
    (2, 'Bugs, Bunny', 'Additional');

CREATE TABLE cases
(
    [case_id] int, 
    [casedate] datetime, 
    [caselocation] varchar(4)
);
    
INSERT INTO cases
    ([case_id], [casedate], [caselocation])
VALUES
    (1, '2023-01-01 00:00:00', 'Home'),
    (2, '2023-01-03 00:00:00', 'Away');

And I want to return a single row per case, there can only be 1 Primary stafftype and a max of 2 Additional stafftypes.

e.g. Result set for case_id = 1

case_id caselocation PrimaryStaff AdditionalStaff1 AdditionalStaff2
1 Home Daffy, Duck Bugs, Bunny Elmer, Fudd

SQL Fiddle Demo: http://sqlfiddle.com/#!18/83bbc9/6

答案1

得分: 1

这种方法是使用条件聚合来完成的。

Fiddle链接: http://sqlfiddle.com/#!18/83bbc9/10/0

with cte as (
select a.case_id
       ,casedate
       ,caselocation
       ,b.staff
       ,ROW_NUMBER() OVER (PARTITION By a.case_id ORDER BY stafftype DESC) RN
from cases a
inner join staff b on a.case_id = b.case_id
  )
 SELECT case_id
        ,casedate
        ,caselocation
        ,MAX(CASE WHEN RN = 1 THEN staff END) PrimaryStaff
        ,MAX(CASE WHEN RN = 2 THEN staff END) AdditionalStaff1
        ,MAX(CASE WHEN RN = 3 THEN staff END) AdditionalStaff2
 FROM cte
 GROUP BY case_id
        ,casedate
        ,caselocation

这个CTE为“主要”stafftype分配了行号“RN”为1,然后将2和3分配给次要stafftypes。然后,最终的选择使用“RN”来组织数据,而“MAX()”和“GROUP BY”组合在一起将数据合并为单行。

您可以选择CTE的内部部分,然后在底部的查询中去掉“MAX()”和“GROUP BY”,然后运行它,以逐步查看发生的情况。

例如,首先运行这个:

select a.case_id
       ,casedate
       ,caselocation
       ,b.staff
       ,ROW_NUMBER() OVER (PARTITION By a.case_id ORDER BY stafftype DESC) RN
from cases a
inner join staff b on a.case_id = b.case_id

然后运行这个:

with cte as (
select a.case_id
       ,casedate
       ,caselocation
       ,b.staff
       ,ROW_NUMBER() OVER (PARTITION By a.case_id ORDER BY stafftype DESC) RN
from cases a
inner join staff b on a.case_id = b.case_id
  )
 SELECT case_id
        ,casedate
        ,caselocation
        ,CASE WHEN RN = 1 THEN staff END PrimaryStaff
        ,CASE WHEN RN = 2 THEN staff END AdditionalStaff1
        ,CASE WHEN RN = 3 THEN staff END AdditionalStaff2
 FROM cte
英文:

One way to do this is conditional aggregation.

Fiddle: http://sqlfiddle.com/#!18/83bbc9/10/0

with cte as (
select a.case_id
       ,casedate
       ,caselocation
       ,b.staff
       ,ROW_NUMBER() OVER (PARTITION By a.case_id ORDER BY stafftype DESC) RN
from cases a
inner join staff b on a.case_id = b.case_id
  )
 SELECT case_id
        ,casedate
        ,caselocation
        ,MAX(CASE WHEN RN = 1 THEN staff END) PrimaryStaff
        ,MAX(CASE WHEN RN = 2 THEN staff END) AdditionalStaff1
        ,MAX(CASE WHEN RN = 3 THEN staff END) AdditionalStaff2
 FROM cte
 GROUP BY case_id
        ,casedate
        ,caselocation

The cte assigns the 'Primary' stafftype a row number RN of 1, then assigns 2 and 3 to the secondary stafftypes. Then the final select uses the RN to organize the data, and the MAX() and GROUP BY combine to roll the data up into single rows.

You can select only the inner portion of the CTE, then remove the MAX() and GROUP BY in the bottom query and run that to see what is happening step by step.

ie. Run this:

select a.case_id
       ,casedate
       ,caselocation
       ,b.staff
       ,ROW_NUMBER() OVER (PARTITION By a.case_id ORDER BY stafftype DESC) RN
from cases a
inner join staff b on a.case_id = b.case_id

Then this:

with cte as (
select a.case_id
       ,casedate
       ,caselocation
       ,b.staff
       ,ROW_NUMBER() OVER (PARTITION By a.case_id ORDER BY stafftype DESC) RN
from cases a
inner join staff b on a.case_id = b.case_id
  )
 SELECT case_id
        ,casedate
        ,caselocation
        ,CASE WHEN RN = 1 THEN staff END PrimaryStaff
        ,CASE WHEN RN = 2 THEN staff END AdditionalStaff1
        ,CASE WHEN RN = 3 THEN staff END AdditionalStaff2
 FROM cte

答案2

得分: 1

提供的只需要2个任意的 'Additional' 员工类型,一个简单的分组查询就足够了。

select 
   a.case_id
   ,casedate
   ,caselocation
   ,b.staff as PrimaryStaff
   ,AdditionalStaff1, AdditionalStaff2
from cases a
inner join staff b on a.case_id = b.case_id and b.stafftype = 'Primary'
inner join (
  select case_id, min([staff]) AdditionalStaff1
     , case when count(*) > 1 then max([staff]) end AdditionalStaff2
  from staff
  where stafftype = 'Additional'
  group by case_id
) c on c.case_id = a.case_id
英文:

Provided only 2 arbitrary 'Additional' stafftypes are required a plain group by will do.

select 
   a.case_id
   ,casedate
   ,caselocation
   ,b.staff as PrimaryStaff
   ,AdditionalStaff1, AdditionalStaff2
from cases a
inner join staff b on a.case_id = b.case_id and b.stafftype = 'Primary'
inner join (
  select case_id, min([staff]) AdditionalStaff1
     , case when count(*) > 1 then max([staff]) end AdditionalStaff2
  from staff
  where stafftype = 'Additional'
  group by case_id
) c on c.case_id = a.case_id  

答案3

得分: 1

以下是代码部分的翻译:

Option 1:

/* 使用 STRING_AGG() 和 "substring parsing" 解决方案 */
SELECT 
c.case_id,
c.caselocation,
f1.PrimaryStaff,
f1.AdditionalStaffList,
f2.*,
IIF(ISNULL(f2.AdditionalListLength,0) > 0, LEFT(f1.AdditionalStaffList,ISNULL(NULLIF(f2.AdditionalListSubIndex,0)-1,f2.AdditionalListLength)) , NULL) [AdditionalStaff1],
IIF(ISNULL(f2.AdditionalListSubIndex,0) > 0, RIGHT(f1.AdditionalStaffList,f2.AdditionalListLength-f2.AdditionalListSubIndex) , NULL) [AdditionalStaff2]
FROM
[cases] c
CROSS APPLY (
    SELECT STRING_AGG(s.staff,';') WITHIN GROUP (ORDER BY IIF(s.stafftype='Primary',1,2) ASC) StaffList,
    MAX(IIF(s.stafftype='Primary',LEN(s.staff)+1,0)) AdditionalStaffSubIndex
    FROM [staff] s WHERE s.[case_id] = c.[case_id] ) s12
OUTER APPLY (
    SELECT LEFT(s12.StaffList,s12.AdditionalStaffSubIndex-1) PrimaryStaff,
    SUBSTRING(s12.StaffList,s12.AdditionalStaffSubIndex+1,LEN(s12.StaffList)) AdditionalStaffList
) f1
OUTER APPLY (
    SELECT PATINDEX('%;%',f1.AdditionalStaffList) AdditionalListSubIndex,
    LEN(f1.AdditionalStaffList) AdditionalListLength
) f2

Option 2:

/* 基于 XML 的查询解决方案 */
SELECT 
c.case_id,
c.caselocation,
s1.staff [PrimaryStaff],
s2.staffList.value('(/s/staff)[1]','varchar(11)') [AdditionalStaff1],
s2.staffList.value('(/s/staff)[2]','varchar(11)') [AdditionalStaff2]
FROM
[cases] c
LEFT JOIN [staff] s1 ON s1.[case_id] = c.[case_id] AND s1.[stafftype] = 'Primary'
OUTER APPLY (SELECT CAST((SELECT s.staff FROM [staff] s WHERE s.[case_id] = c.[case_id] AND s.[stafftype] = 'Additional' FOR XML AUTO,ELEMENTS) AS XML) ) s2 (staffList)

感谢你的理解,如果有其他需要,请随时告诉我。

英文:

There are other great answers in this thread, and in many engineering decisions what is the "best" solution will often go by an "it depends" rule. However, I also think this is a small enough an example for a common enough architectural decision point that I think it's worth showing some other "ways" it can be solved. I had considered using STRING_AGG and then splitting back out as a sample of one way to do this. This requires some SUBSTRING+PATINDEX logic:

Option 1:

/* Use a STRING_AGG() and "substring parsing" solution */
SELECT 
c.case_id,
c.caselocation,
f1.PrimaryStaff,
f1.AdditionalStaffList,
f2.*
, IIF(ISNULL(f2.AdditionalListLength,0)   >0,  LEFT(f1.AdditionalStaffList,ISNULL(NULLIF(f2.AdditionalListSubIndex,0)-1,f2.AdditionalListLength)) , NULL) [AdditionalStaff1]
, IIF(ISNULL(f2.AdditionalListSubIndex,0) >0,  RIGHT(f1.AdditionalStaffList,f2.AdditionalListLength-f2.AdditionalListSubIndex) , NULL) [AdditionalStaff2]
FROM
[cases] c
CROSS APPLY (
    SELECT STRING_AGG(s.staff,';') WITHIN GROUP (ORDER BY IIF(s.stafftype='Primary',1,2) ASC) StaffList ,
    MAX(IIF(s.stafftype='Primary',LEN(s.staff)+1,0)) AdditionalStaffSubIndex
    FROM [staff] s WHERE s.[case_id] = c.[case_id] ) s12
OUTER APPLY (
    SELECT LEFT(s12.StaffList,s12.AdditionalStaffSubIndex-1) PrimaryStaff,
    SUBSTRING(s12.StaffList,s12.AdditionalStaffSubIndex+1,LEN(s12.StaffList)) AdditionalStaffList
) f1
OUTER APPLY (
    SELECT PATINDEX('%;%',f1.AdditionalStaffList) AdditionalListSubIndex,
    LEN(f1.AdditionalStaffList) AdditionalListLength
) f2

But I also realized this is a great example to show some of the power of built in XML processing that is available on all supported versions of SQL server. What this can do is abstract the need for a "row numbering" scheme away from the data query itself. XML ,and - JSON, both provide ways to take "a subset of data from a join or subquery" that may contain "0 or more rows" and turn that subset into a "logically single entity/expression" that can be parsed to get specific information "back out" at the end of (or later in) the query pipeline. This can be sometimes be useful in more complex query scenarios when the underlying "data architecture" is 'fixed'.

A snippet showing the basic premise of what the XML-based query will do "per case id":

--APPLY this:
SELECT s.staff FROM [staff] s WHERE s.[case_id] = 1 AND s.[stafftype] = 'Additional' FOR XML AUTO,ELEMENTS

It's also possible that your data constraints may also look something like this as a way to force "only one primary staff per case" and ensuring that "each staff can only be on a given case_id once":

--Each staff is only ONCE per case_id:
CREATE UNIQUE NONCLUSTERED INDEX UQ_StaffCase ON [staff] ([case_id],[stafftype],[staff])
--Only ONE primary staff per case_id:
CREATE UNIQUE NONCLUSTERED INDEX UQ_StaffCase_Primary ON [staff] ([case_id]) INCLUDE ([staff]) WHERE [stafftype] = 'Primary'

And so, the following XML-based query is straightforward to read, and could be adjusted to only join once. This gives you the option to change to an INNER join on the primary staff:

Option 2:

/* An XML-based query solution */
SELECT 
c.case_id,
c.caselocation,
s1.staff [PrimaryStaff],
s2.staffList.value('(/s/staff)[1]','varchar(11)') [AdditionalStaff1],
s2.staffList.value('(/s/staff)[2]','varchar(11)') [AdditionalStaff2]
FROM
[cases] c
LEFT JOIN [staff] s1 ON s1.[case_id] = c.[case_id] AND s1.[stafftype] = 'Primary'
OUTER APPLY (SELECT CAST((SELECT s.staff FROM [staff] s WHERE s.[case_id] = c.[case_id] AND s.[stafftype] = 'Additional' FOR XML AUTO,ELEMENTS) AS XML) ) s2 (staffList)

Further Reading:
Earlier above, I mentioned cases where you need to query a "fixed data architecture". If the architecture is still being designed/developed, you may have options to provide input or adjust how the schema is represented so that queries may be able to be constructed differently in the first place, or to take advantage of additional features of the storage and processing engines of sql server. This shows one possibility of initial schema design (where i have added "zz" and "yy" to names to :a,: prevent naming conflicts against the OP schema if testing on the same database, and :b,: clearly show these objects as testing and development schema items).

As additional context not directly presenting a solution to OP, i'll leave the below without further explanation:

/* Alternative Schema Scenario:*/
CREATE SCHEMA yyStaff
GO
CREATE SCHEMA yyCases
GO

CREATE TABLE yyStaff.zz_staff
(
    [staffid]    INT IDENTITY(1,1) NOT NULL CONSTRAINT CK_Staff UNIQUE      CLUSTERED    
  , [staff_name]       VARCHAR(11) NOT NULL CONSTRAINT PK_Staff PRIMARY KEY NONCLUSTERED WITH (DATA_COMPRESSION = ROW) --eg. 'Daffy, Duck'
) WITH(DATA_COMPRESSION = ROW);

CREATE TABLE yyCases.zz_locationtype
(
    [locationtypeid] INT IDENTITY(1,1) NOT NULL CONSTRAINT CK_CaseLocationType UNIQUE      CLUSTERED    
  , [location]              VARCHAR(4) NOT NULL CONSTRAINT PK_CaseLocationType PRIMARY KEY NONCLUSTERED WITH (DATA_COMPRESSION = ROW) --eg. ('Home'),('Away')
) WITH(DATA_COMPRESSION = ROW);

CREATE TABLE yyCases.zz_case
(
    [caseid]            INT IDENTITY(1,1) NOT NULL   CONSTRAINT CK_Case UNIQUE CLUSTERED  
  , [casedate]          DATETIMEOFFSET(0) NOT NULL                                         
  , [reflocationtype]                 INT NOT NULL   CONSTRAINT FK_Case_CaseLocationType FOREIGN KEY REFERENCES yyCases.zz_locationtype (locationtypeid)
);

CREATE NONCLUSTERED INDEX IX_Case__CaseLocationType_Date ON yyCases.zz_case (reflocationtype,casedate)           WITH (DATA_COMPRESSION  = ROW);
CREATE NONCLUSTERED INDEX IX_Case__DateICaseLocationType ON yyCases.zz_case (casedate) INCLUDE (reflocationtype) WITH (DATA_COMPRESSION  = ROW);

/* Alternatively, since there are only two, just have an [IsPrimary] column on zz_casestaff ? */
--CREATE TABLE yyCases.zz_stafftype
--(
--    [stafftypeid] INT IDENTITY(1,1) NOT NULL CONSTRAINT CK_CaseStaffType UNIQUE      CLUSTERED    
--  , [stafftype]         VARCHAR(10) NOT NULL CONSTRAINT PK_CaseStaffType PRIMARY KEY NONCLUSTERED WITH (DATA_COMPRESSION = ROW) --eg. ('Primary'),('Additional')
--) WITH(DATA_COMPRESSION = ROW);
/* yes, see below */

CREATE TABLE yyCases.zz_casestaff
(
    [casestaffid] INT IDENTITY(1,1) NOT NULL CONSTRAINT CK_CaseStaff        UNIQUE      CLUSTERED 
  , [refcase]                   INT NOT NULL CONSTRAINT FK_CaseStaff_Case   FOREIGN KEY REFERENCES yyCases.zz_case  (caseid)
  , [refstaff]                  INT NOT NULL CONSTRAINT FK_CaseStaff_Staff  FOREIGN KEY REFERENCES yyStaff.zz_staff (staffid)
  , [isprimary]                 BIT NOT NULL
);

--Each staff is only ONCE per case_id:
CREATE UNIQUE NONCLUSTERED INDEX UQ_CaseStaff__Case_Staff    ON yyCases.zz_casestaff (refcase,refstaff) INCLUDE (IsPrimary);
--Only ONE primary staff per case_id:
CREATE UNIQUE NONCLUSTERED INDEX UQ_CaseStaff__CaseFPrimary  ON yyCases.zz_casestaff (refcase) INCLUDE (refstaff) WHERE IsPrimary = 1;
--For query processing: Sort cases by staff, and if primary:
CREATE NONCLUSTERED INDEX IX_CaseStaff_Staff ON yyCases.zz_casestaff (refstaff,isprimary) INCLUDE (refcase)

答案4

得分: 0

以下是代码的中文翻译部分:

WITH
  staff_sorted
AS
(
  SELECT
    *,
    ROW_NUMBER()
      OVER (
        PARTITION BY case_id, stafftype
            ORDER BY staff
    )
      AS ordinal
  FROM
    staff
),
  staff_pivotted
AS
(
  SELECT
    case_id,
    MAX(CASE WHEN stafftype = 'Primary'                    THEN staff END) AS staff_primary,
    MAX(CASE WHEN stafftype = 'Additional' AND ordinal = 1 THEN staff END) AS staff_additional_1,
    MAX(CASE WHEN stafftype = 'Additional' AND ordinal = 2 THEN staff END) AS staff_additional_2
  FROM
    staff_sorted
  GROUP BY
    case_id
)  
SELECT
  *
FROM
  cases            AS c
INNER JOIN
  staff_pivotted   AS s
    ON c.case_id = s.case_id
英文:
WITH
staff_sorted
AS
(
SELECT
*,
ROW_NUMBER()
OVER (
PARTITION BY case_id, stafftype
ORDER BY staff
)
AS ordinal
FROM
staff
),
staff_pivotted
AS
(
SELECT
case_id,
MAX(CASE WHEN stafftype = 'Primary'                    THEN staff END) AS staff_primary,
MAX(CASE WHEN stafftype = 'Additional' AND ordinal = 1 THEN staff END) AS staff_additional_1,
MAX(CASE WHEN stafftype = 'Additional' AND ordinal = 2 THEN staff END) AS staff_additional_2
FROM
staff_sorted
GROUP BY
case_id
)  
SELECT
*
FROM
cases            AS c
INNER JOIN
staff_pivotted   AS s
ON c.case_id = s.case_id

huangapple
  • 本文由 发表于 2023年3月7日 23:46:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/75664177.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定