英文:
Return 1 row from multiple joins to same table - SQL Server
问题
我有这些表格:
CREATE TABLE staff
(
[case_id] int,
[staff] varchar(11),
[stafftype] varchar(10)
);
INSERT INTO staff
([case_id], [staff], [stafftype])
VALUES
(1, 'Daffy, Duck', 'Primary'),
(1, 'Bugs, Bunny', 'Additional'),
(1, 'Elmer, Fudd', 'Additional'),
(2, 'Daffy, Duck', 'Primary'),
(2, 'Bugs, Bunny', 'Additional');
CREATE TABLE cases
(
[case_id] int,
[casedate] datetime,
[caselocation] varchar(4)
);
INSERT INTO cases
([case_id], [casedate], [caselocation])
VALUES
(1, '2023-01-01 00:00:00', 'Home'),
(2, '2023-01-03 00:00:00', 'Away');
我想要为每个案例返回一行,每个案例只能有1个 Primary
stafftype 和最多2个 Additional
stafftypes。
例如,对于 case_id = 1 的结果集:
case_id | caselocation | PrimaryStaff | AdditionalStaff1 | AdditionalStaff2 |
---|---|---|---|---|
1 | Home | Daffy, Duck | Bugs, Bunny | Elmer, Fudd |
SQL Fiddle 演示: http://sqlfiddle.com/#!18/83bbc9/6
英文:
I have these tables:
CREATE TABLE staff
(
[case_id] int,
[staff] varchar(11),
[stafftype] varchar(10)
);
INSERT INTO staff
([case_id], [staff], [stafftype])
VALUES
(1, 'Daffy, Duck', 'Primary'),
(1, 'Bugs, Bunny', 'Additional'),
(1, 'Elmer, Fudd', 'Additional'),
(2, 'Daffy, Duck', 'Primary'),
(2, 'Bugs, Bunny', 'Additional');
CREATE TABLE cases
(
[case_id] int,
[casedate] datetime,
[caselocation] varchar(4)
);
INSERT INTO cases
([case_id], [casedate], [caselocation])
VALUES
(1, '2023-01-01 00:00:00', 'Home'),
(2, '2023-01-03 00:00:00', 'Away');
And I want to return a single row per case, there can only be 1 Primary
stafftype and a max of 2 Additional
stafftypes.
e.g. Result set for case_id = 1
case_id | caselocation | PrimaryStaff | AdditionalStaff1 | AdditionalStaff2 |
---|---|---|---|---|
1 | Home | Daffy, Duck | Bugs, Bunny | Elmer, Fudd |
SQL Fiddle Demo: http://sqlfiddle.com/#!18/83bbc9/6
答案1
得分: 1
这种方法是使用条件聚合来完成的。
Fiddle链接: http://sqlfiddle.com/#!18/83bbc9/10/0
with cte as (
select a.case_id
,casedate
,caselocation
,b.staff
,ROW_NUMBER() OVER (PARTITION By a.case_id ORDER BY stafftype DESC) RN
from cases a
inner join staff b on a.case_id = b.case_id
)
SELECT case_id
,casedate
,caselocation
,MAX(CASE WHEN RN = 1 THEN staff END) PrimaryStaff
,MAX(CASE WHEN RN = 2 THEN staff END) AdditionalStaff1
,MAX(CASE WHEN RN = 3 THEN staff END) AdditionalStaff2
FROM cte
GROUP BY case_id
,casedate
,caselocation
这个CTE为“主要”stafftype分配了行号“RN”为1,然后将2和3分配给次要stafftypes。然后,最终的选择使用“RN”来组织数据,而“MAX()”和“GROUP BY”组合在一起将数据合并为单行。
您可以选择CTE的内部部分,然后在底部的查询中去掉“MAX()”和“GROUP BY”,然后运行它,以逐步查看发生的情况。
例如,首先运行这个:
select a.case_id
,casedate
,caselocation
,b.staff
,ROW_NUMBER() OVER (PARTITION By a.case_id ORDER BY stafftype DESC) RN
from cases a
inner join staff b on a.case_id = b.case_id
然后运行这个:
with cte as (
select a.case_id
,casedate
,caselocation
,b.staff
,ROW_NUMBER() OVER (PARTITION By a.case_id ORDER BY stafftype DESC) RN
from cases a
inner join staff b on a.case_id = b.case_id
)
SELECT case_id
,casedate
,caselocation
,CASE WHEN RN = 1 THEN staff END PrimaryStaff
,CASE WHEN RN = 2 THEN staff END AdditionalStaff1
,CASE WHEN RN = 3 THEN staff END AdditionalStaff2
FROM cte
英文:
One way to do this is conditional aggregation.
Fiddle: http://sqlfiddle.com/#!18/83bbc9/10/0
with cte as (
select a.case_id
,casedate
,caselocation
,b.staff
,ROW_NUMBER() OVER (PARTITION By a.case_id ORDER BY stafftype DESC) RN
from cases a
inner join staff b on a.case_id = b.case_id
)
SELECT case_id
,casedate
,caselocation
,MAX(CASE WHEN RN = 1 THEN staff END) PrimaryStaff
,MAX(CASE WHEN RN = 2 THEN staff END) AdditionalStaff1
,MAX(CASE WHEN RN = 3 THEN staff END) AdditionalStaff2
FROM cte
GROUP BY case_id
,casedate
,caselocation
The cte assigns the 'Primary' stafftype a row number RN
of 1, then assigns 2 and 3 to the secondary stafftypes. Then the final select uses the RN
to organize the data, and the MAX()
and GROUP BY
combine to roll the data up into single rows.
You can select only the inner portion of the CTE, then remove the MAX()
and GROUP BY
in the bottom query and run that to see what is happening step by step.
ie. Run this:
select a.case_id
,casedate
,caselocation
,b.staff
,ROW_NUMBER() OVER (PARTITION By a.case_id ORDER BY stafftype DESC) RN
from cases a
inner join staff b on a.case_id = b.case_id
Then this:
with cte as (
select a.case_id
,casedate
,caselocation
,b.staff
,ROW_NUMBER() OVER (PARTITION By a.case_id ORDER BY stafftype DESC) RN
from cases a
inner join staff b on a.case_id = b.case_id
)
SELECT case_id
,casedate
,caselocation
,CASE WHEN RN = 1 THEN staff END PrimaryStaff
,CASE WHEN RN = 2 THEN staff END AdditionalStaff1
,CASE WHEN RN = 3 THEN staff END AdditionalStaff2
FROM cte
答案2
得分: 1
提供的只需要2个任意的 'Additional' 员工类型,一个简单的分组查询就足够了。
select
a.case_id
,casedate
,caselocation
,b.staff as PrimaryStaff
,AdditionalStaff1, AdditionalStaff2
from cases a
inner join staff b on a.case_id = b.case_id and b.stafftype = 'Primary'
inner join (
select case_id, min([staff]) AdditionalStaff1
, case when count(*) > 1 then max([staff]) end AdditionalStaff2
from staff
where stafftype = 'Additional'
group by case_id
) c on c.case_id = a.case_id
英文:
Provided only 2 arbitrary 'Additional' stafftypes are required a plain group by will do.
select
a.case_id
,casedate
,caselocation
,b.staff as PrimaryStaff
,AdditionalStaff1, AdditionalStaff2
from cases a
inner join staff b on a.case_id = b.case_id and b.stafftype = 'Primary'
inner join (
select case_id, min([staff]) AdditionalStaff1
, case when count(*) > 1 then max([staff]) end AdditionalStaff2
from staff
where stafftype = 'Additional'
group by case_id
) c on c.case_id = a.case_id
答案3
得分: 1
以下是代码部分的翻译:
Option 1:
/* 使用 STRING_AGG() 和 "substring parsing" 解决方案 */
SELECT
c.case_id,
c.caselocation,
f1.PrimaryStaff,
f1.AdditionalStaffList,
f2.*,
IIF(ISNULL(f2.AdditionalListLength,0) > 0, LEFT(f1.AdditionalStaffList,ISNULL(NULLIF(f2.AdditionalListSubIndex,0)-1,f2.AdditionalListLength)) , NULL) [AdditionalStaff1],
IIF(ISNULL(f2.AdditionalListSubIndex,0) > 0, RIGHT(f1.AdditionalStaffList,f2.AdditionalListLength-f2.AdditionalListSubIndex) , NULL) [AdditionalStaff2]
FROM
[cases] c
CROSS APPLY (
SELECT STRING_AGG(s.staff,';') WITHIN GROUP (ORDER BY IIF(s.stafftype='Primary',1,2) ASC) StaffList,
MAX(IIF(s.stafftype='Primary',LEN(s.staff)+1,0)) AdditionalStaffSubIndex
FROM [staff] s WHERE s.[case_id] = c.[case_id] ) s12
OUTER APPLY (
SELECT LEFT(s12.StaffList,s12.AdditionalStaffSubIndex-1) PrimaryStaff,
SUBSTRING(s12.StaffList,s12.AdditionalStaffSubIndex+1,LEN(s12.StaffList)) AdditionalStaffList
) f1
OUTER APPLY (
SELECT PATINDEX('%;%',f1.AdditionalStaffList) AdditionalListSubIndex,
LEN(f1.AdditionalStaffList) AdditionalListLength
) f2
Option 2:
/* 基于 XML 的查询解决方案 */
SELECT
c.case_id,
c.caselocation,
s1.staff [PrimaryStaff],
s2.staffList.value('(/s/staff)[1]','varchar(11)') [AdditionalStaff1],
s2.staffList.value('(/s/staff)[2]','varchar(11)') [AdditionalStaff2]
FROM
[cases] c
LEFT JOIN [staff] s1 ON s1.[case_id] = c.[case_id] AND s1.[stafftype] = 'Primary'
OUTER APPLY (SELECT CAST((SELECT s.staff FROM [staff] s WHERE s.[case_id] = c.[case_id] AND s.[stafftype] = 'Additional' FOR XML AUTO,ELEMENTS) AS XML) ) s2 (staffList)
感谢你的理解,如果有其他需要,请随时告诉我。
英文:
There are other great answers in this thread, and in many engineering decisions what is the "best" solution will often go by an "it depends" rule. However, I also think this is a small enough an example for a common enough architectural decision point that I think it's worth showing some other "ways" it can be solved. I had considered using STRING_AGG
and then splitting back out as a sample of one way to do this. This requires some SUBSTRING
+PATINDEX
logic:
Option 1:
/* Use a STRING_AGG() and "substring parsing" solution */
SELECT
c.case_id,
c.caselocation,
f1.PrimaryStaff,
f1.AdditionalStaffList,
f2.*
, IIF(ISNULL(f2.AdditionalListLength,0) >0, LEFT(f1.AdditionalStaffList,ISNULL(NULLIF(f2.AdditionalListSubIndex,0)-1,f2.AdditionalListLength)) , NULL) [AdditionalStaff1]
, IIF(ISNULL(f2.AdditionalListSubIndex,0) >0, RIGHT(f1.AdditionalStaffList,f2.AdditionalListLength-f2.AdditionalListSubIndex) , NULL) [AdditionalStaff2]
FROM
[cases] c
CROSS APPLY (
SELECT STRING_AGG(s.staff,';') WITHIN GROUP (ORDER BY IIF(s.stafftype='Primary',1,2) ASC) StaffList ,
MAX(IIF(s.stafftype='Primary',LEN(s.staff)+1,0)) AdditionalStaffSubIndex
FROM [staff] s WHERE s.[case_id] = c.[case_id] ) s12
OUTER APPLY (
SELECT LEFT(s12.StaffList,s12.AdditionalStaffSubIndex-1) PrimaryStaff,
SUBSTRING(s12.StaffList,s12.AdditionalStaffSubIndex+1,LEN(s12.StaffList)) AdditionalStaffList
) f1
OUTER APPLY (
SELECT PATINDEX('%;%',f1.AdditionalStaffList) AdditionalListSubIndex,
LEN(f1.AdditionalStaffList) AdditionalListLength
) f2
But I also realized this is a great example to show some of the power of built in XML processing that is available on all supported versions of SQL server. What this can do is abstract the need for a "row numbering" scheme away from the data query itself. XML
,and - JSON
, both provide ways to take "a subset of data from a join or subquery" that may contain "0 or more rows" and turn that subset into a "logically single entity/expression" that can be parsed to get specific information "back out" at the end of (or later in) the query pipeline. This can be sometimes be useful in more complex query scenarios when the underlying "data architecture" is 'fixed'.
A snippet showing the basic premise of what the XML-based query will do "per case id":
--APPLY this:
SELECT s.staff FROM [staff] s WHERE s.[case_id] = 1 AND s.[stafftype] = 'Additional' FOR XML AUTO,ELEMENTS
It's also possible that your data constraints may also look something like this as a way to force "only one primary staff per case" and ensuring that "each staff can only be on a given case_id once":
--Each staff is only ONCE per case_id:
CREATE UNIQUE NONCLUSTERED INDEX UQ_StaffCase ON [staff] ([case_id],[stafftype],[staff])
--Only ONE primary staff per case_id:
CREATE UNIQUE NONCLUSTERED INDEX UQ_StaffCase_Primary ON [staff] ([case_id]) INCLUDE ([staff]) WHERE [stafftype] = 'Primary'
And so, the following XML-based query is straightforward to read, and could be adjusted to only join once. This gives you the option to change to an INNER
join on the primary staff:
Option 2:
/* An XML-based query solution */
SELECT
c.case_id,
c.caselocation,
s1.staff [PrimaryStaff],
s2.staffList.value('(/s/staff)[1]','varchar(11)') [AdditionalStaff1],
s2.staffList.value('(/s/staff)[2]','varchar(11)') [AdditionalStaff2]
FROM
[cases] c
LEFT JOIN [staff] s1 ON s1.[case_id] = c.[case_id] AND s1.[stafftype] = 'Primary'
OUTER APPLY (SELECT CAST((SELECT s.staff FROM [staff] s WHERE s.[case_id] = c.[case_id] AND s.[stafftype] = 'Additional' FOR XML AUTO,ELEMENTS) AS XML) ) s2 (staffList)
Further Reading:
Earlier above, I mentioned cases where you need to query a "fixed data architecture". If the architecture is still being designed/developed, you may have options to provide input or adjust how the schema is represented so that queries may be able to be constructed differently in the first place, or to take advantage of additional features of the storage and processing engines of sql server
. This shows one possibility of initial schema design (where i have added "zz
" and "yy
" to names to :a,: prevent naming conflicts against the OP schema if testing on the same database, and :b,: clearly show these objects as testing and development schema items).
As additional context not directly presenting a solution to OP, i'll leave the below without further explanation:
/* Alternative Schema Scenario:*/
CREATE SCHEMA yyStaff
GO
CREATE SCHEMA yyCases
GO
CREATE TABLE yyStaff.zz_staff
(
[staffid] INT IDENTITY(1,1) NOT NULL CONSTRAINT CK_Staff UNIQUE CLUSTERED
, [staff_name] VARCHAR(11) NOT NULL CONSTRAINT PK_Staff PRIMARY KEY NONCLUSTERED WITH (DATA_COMPRESSION = ROW) --eg. 'Daffy, Duck'
) WITH(DATA_COMPRESSION = ROW);
CREATE TABLE yyCases.zz_locationtype
(
[locationtypeid] INT IDENTITY(1,1) NOT NULL CONSTRAINT CK_CaseLocationType UNIQUE CLUSTERED
, [location] VARCHAR(4) NOT NULL CONSTRAINT PK_CaseLocationType PRIMARY KEY NONCLUSTERED WITH (DATA_COMPRESSION = ROW) --eg. ('Home'),('Away')
) WITH(DATA_COMPRESSION = ROW);
CREATE TABLE yyCases.zz_case
(
[caseid] INT IDENTITY(1,1) NOT NULL CONSTRAINT CK_Case UNIQUE CLUSTERED
, [casedate] DATETIMEOFFSET(0) NOT NULL
, [reflocationtype] INT NOT NULL CONSTRAINT FK_Case_CaseLocationType FOREIGN KEY REFERENCES yyCases.zz_locationtype (locationtypeid)
);
CREATE NONCLUSTERED INDEX IX_Case__CaseLocationType_Date ON yyCases.zz_case (reflocationtype,casedate) WITH (DATA_COMPRESSION = ROW);
CREATE NONCLUSTERED INDEX IX_Case__DateICaseLocationType ON yyCases.zz_case (casedate) INCLUDE (reflocationtype) WITH (DATA_COMPRESSION = ROW);
/* Alternatively, since there are only two, just have an [IsPrimary] column on zz_casestaff ? */
--CREATE TABLE yyCases.zz_stafftype
--(
-- [stafftypeid] INT IDENTITY(1,1) NOT NULL CONSTRAINT CK_CaseStaffType UNIQUE CLUSTERED
-- , [stafftype] VARCHAR(10) NOT NULL CONSTRAINT PK_CaseStaffType PRIMARY KEY NONCLUSTERED WITH (DATA_COMPRESSION = ROW) --eg. ('Primary'),('Additional')
--) WITH(DATA_COMPRESSION = ROW);
/* yes, see below */
CREATE TABLE yyCases.zz_casestaff
(
[casestaffid] INT IDENTITY(1,1) NOT NULL CONSTRAINT CK_CaseStaff UNIQUE CLUSTERED
, [refcase] INT NOT NULL CONSTRAINT FK_CaseStaff_Case FOREIGN KEY REFERENCES yyCases.zz_case (caseid)
, [refstaff] INT NOT NULL CONSTRAINT FK_CaseStaff_Staff FOREIGN KEY REFERENCES yyStaff.zz_staff (staffid)
, [isprimary] BIT NOT NULL
);
--Each staff is only ONCE per case_id:
CREATE UNIQUE NONCLUSTERED INDEX UQ_CaseStaff__Case_Staff ON yyCases.zz_casestaff (refcase,refstaff) INCLUDE (IsPrimary);
--Only ONE primary staff per case_id:
CREATE UNIQUE NONCLUSTERED INDEX UQ_CaseStaff__CaseFPrimary ON yyCases.zz_casestaff (refcase) INCLUDE (refstaff) WHERE IsPrimary = 1;
--For query processing: Sort cases by staff, and if primary:
CREATE NONCLUSTERED INDEX IX_CaseStaff_Staff ON yyCases.zz_casestaff (refstaff,isprimary) INCLUDE (refcase)
答案4
得分: 0
以下是代码的中文翻译部分:
WITH
staff_sorted
AS
(
SELECT
*,
ROW_NUMBER()
OVER (
PARTITION BY case_id, stafftype
ORDER BY staff
)
AS ordinal
FROM
staff
),
staff_pivotted
AS
(
SELECT
case_id,
MAX(CASE WHEN stafftype = 'Primary' THEN staff END) AS staff_primary,
MAX(CASE WHEN stafftype = 'Additional' AND ordinal = 1 THEN staff END) AS staff_additional_1,
MAX(CASE WHEN stafftype = 'Additional' AND ordinal = 2 THEN staff END) AS staff_additional_2
FROM
staff_sorted
GROUP BY
case_id
)
SELECT
*
FROM
cases AS c
INNER JOIN
staff_pivotted AS s
ON c.case_id = s.case_id
英文:
WITH
staff_sorted
AS
(
SELECT
*,
ROW_NUMBER()
OVER (
PARTITION BY case_id, stafftype
ORDER BY staff
)
AS ordinal
FROM
staff
),
staff_pivotted
AS
(
SELECT
case_id,
MAX(CASE WHEN stafftype = 'Primary' THEN staff END) AS staff_primary,
MAX(CASE WHEN stafftype = 'Additional' AND ordinal = 1 THEN staff END) AS staff_additional_1,
MAX(CASE WHEN stafftype = 'Additional' AND ordinal = 2 THEN staff END) AS staff_additional_2
FROM
staff_sorted
GROUP BY
case_id
)
SELECT
*
FROM
cases AS c
INNER JOIN
staff_pivotted AS s
ON c.case_id = s.case_id
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论