如何在连接中获取不重复的行?

huangapple go评论65阅读模式
英文:

How to get not duplicate rows in join?

问题

我遇到了一些关于重复行的问题,我不想得到它们。

嗨!
我有两个表 - tab1、tab2,我想将tab2连接到tab1,如下所示:

SELECT t1.column_A1, t2.column_B2 
FROM tab1 t1
JOIN
tab2 t2
ON t1.column_A1=t2.column_A2 

tab1

Column A1 Column B1 Column C1
Z1 Cell 2 Cell 3
Z2 Cell 5 Cell 6

tab2

Column A2 Column B2 Column C2
Z1 PW Cell 3
Z1 RW Cell 6

对于tab1中的某些行,在tab2中有多于1行的情况。

结果将是:

Column A2 Column B2 Column C2
Z1 PW RE
Z1 RW KS

我想要的是:
如果是PW - 只显示一行PW;
如果不是PW - 只显示一行RW

结果应该是:

Column A2 Column B2 Column C2
Z1 PW RE
英文:

I got some problems with duplicate rows which I don't wanna get.

Hi!
I got two tables - tab1, tab2 and I want to join tab2 to tab1 like:

SELECT t1.column_A1, t2.column_B2 
FROM tab1 t1
JOIN
tab2 t2
ON t1.column_A1=t2.column_A2 

tab1

| Column A1 | Column B1 | Column C1 |
|  -------- |  -------- |  -------- |
|  Z1       |  Cell 2   |  Cell 3   |
|  Z2       |  Cell 5   |  Cell 6   |

tab2

| Column A2 | Column B2 | Column C2 |
|  -------- |  -------- |  -------- |
|  Z1       |  PW       |  Cell 3   |
|  Z1       |  RW       |  Cell 6   |

For some rows in tab1 there are more than 1 rows in tab2.

The result will be:

| Column A2 | Column B2 | Column C2 |
|  -------- |  -------- |  -------- |
|  Z1       |  PW       |  RE       |
|  Z1       |  RW       |  KS       |

I want to get:
if PW - show only one row with PW;
if not PW - show only one row with RW

The result should be:

| Column A2 | Column B2 | Column C2 |
|  -------- |  -------- |  -------- |
|  Z1       |  PW       |  RE       |

答案1

得分: 1

一个选择是根据存储在column_b2中的值来对每个column_a1的行进行“排序”,然后返回排名最高的行。

示例数据:

SQL> WITH
  2     tab1 (column_a1, column_b1, column_c1)
  3     AS
  4        (SELECT 'Z1', 'cell 2', 'cell 3' FROM DUAL
  5         UNION ALL
  6         SELECT 'Z2', 'cell 5', 'cell 6' FROM DUAL),
  7     tab2 (column_a2, column_b2, column_c2)
  8     AS
  9        (SELECT 'Z1', 'PW', 'cell 3' FROM DUAL
 10         UNION ALL
 11         SELECT 'Z1', 'RW', 'cell 6' FROM DUAL
 12         UNION ALL
 13         SELECT 'Z2', 'RW', 'cell 8' FROM DUAL),

Query begins here:

 14     temp
 15     AS
 16        (SELECT t1.column_A1,
 17                t2.column_B2,
 18                ROW_NUMBER () OVER (PARTITION BY t1.column_a1 ORDER BY t2.column_b2) rn
 19           FROM tab1 t1 JOIN tab2 t2 ON t1.column_A1 = t2.column_A2)
 20  SELECT column_a1, column_b2
 21    FROM temp
 22   WHERE rn = 1;

COLUMN_A1    COLUMN_B2
------------ ------------
Z1           PW
Z2           RW

SQL>;
英文:

One option is to "sort" rows per each column_a1 by value stored in column_b2 and return rows that rank as the highest.

Sample data:

SQL> WITH
  2     tab1 (column_a1, column_b1, column_c1)
  3     AS
  4        (SELECT 'Z1', 'cell 2', 'cell 3' FROM DUAL
  5         UNION ALL
  6         SELECT 'Z2', 'cell 5', 'cell 6' FROM DUAL),
  7     tab2 (column_a2, column_b2, column_c2)
  8     AS
  9        (SELECT 'Z1', 'PW', 'cell 3' FROM DUAL
 10         UNION ALL
 11         SELECT 'Z1', 'RW', 'cell 6' FROM DUAL
 12         UNION ALL
 13         SELECT 'Z2', 'RW', 'cell 8' FROM DUAL),

Query begins here:

 14     temp
 15     AS
 16        (SELECT t1.column_A1,
 17                t2.column_B2,
 18                ROW_NUMBER () OVER (PARTITION BY t1.column_a1 ORDER BY t2.column_b2) rn
 19           FROM tab1 t1 JOIN tab2 t2 ON t1.column_A1 = t2.column_A2)
 20  SELECT column_a1, column_b2
 21    FROM temp
 22   WHERE rn = 1;

COLUMN_A1    COLUMN_B2
------------ ------------
Z1           PW
Z2           RW

SQL>

答案2

得分: 0

这是一个典型的涉及没有主键的表的任务,即存在重复记录,但有一些规则来获取正确的唯一行。

在您的情况下,规则是:

> 如果是PW - 只显示一个具有PW的行;如果不是PW - 只显示一个具有RW的行

您可以使用row_number函数来实现它,使用您的(重复的)键列上的partition by,并使用order by来实现您的规则(使用decode),以便排序提供所需的行作为第一行。

示例

select 
   COLUMN_A2, COLUMN_B2, COLUMN_C2,
   row_number() over (partition by COLUMN_A2
                      order by decode (COLUMN_B2,'PW',1,'RW',2,3),COLUMN_B2) as rn 
from tab2;        
CO CO COLUMN         RN
-- -- ------ ----------
Z1 PW cell 3          1
Z1 RW cell 6          2

连接与您以前使用的相同,只需将rn = 1谓词添加到on子句中。

请注意,我将COLUMN_B2作为第二排序列添加;这是为了当您的两个字符串都不存在时使用最低值的情况。

您应该始终使用这样的order by列列表,以便它们与partition by列一起构成唯一键。然后,查询将提供确定性的结果。

英文:

This is a typical task on tables without primary keys, i.e. with duplication where there is some rule how to fetch the proper unique row.

In your case the rule is

> if PW - show only one row with PW; if not PW - show only one row with RW

You implement it using row_number function, partition byon your (duplicated) key column and order by implementing your rule (using decode) so that the order provides the required row as first.

Example

select 
   COLUMN_A2, COLUMN_B2, COLUMN_C2,
   row_number() over (partition by COLUMN_A2
                      order by decode (COLUMN_B2,'PW',1,'RW',2,3),COLUMN_B2) as rn 
from tab2;        


CO CO COLUMN         RN
-- -- ------ ----------
Z1 PW cell 3          1
Z1 RW cell 6          2

The join is the same as you used only adding the rn = 1 predicate to the on
clause.

Note that I added COLUMN_B2as a second order by column; this is for the case when neither of your two strings are present so the lowest value is used.

You should always use such order by column list that they together with the partition by column(s) makes a unique key. Than the query provides a deterministic result.

huangapple
  • 本文由 发表于 2023年6月1日 20:20:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76381796.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定