使用DISTINCT和RANK()函数一起在SELECT语句中用于唯一值是否正确?

huangapple go评论65阅读模式
英文:

Is it right to use DISTINCT and RANK() function together for distinct values in a select statement?

问题

在一个SELECT语句中,我们可以同时使用DISTINCT和RANK()函数来获取不同的值吗?让我们考虑两种情况。

  1. 使用RANK()和DISTINCT
  2. 仅使用RANK()。
    在这两种情况下,我们是否会获得相同的不同值?
SELECT * FROM (
SELECT DISTINCT
 col1,
 col2,
 col3,
 col4,
 row_number() OVER (PARTITION BY col1, col2 ORDER BY col3) RN
FROM table)
WHERE RN = 1;

SELECT * FROM (
SELECT
 col1,
 col2,
 col3,
 col4,
 row_number() OVER (PARTITION BY col1, col2 ORDER BY col3) RN
FROM table) 
WHERE RN = 1;
英文:

Can we use DISTINCT and RANK() function together in a select statement for distinct values? Lets take 2 scenarios
1.Using RANK() and DISTINCT
2.Using only RANK() .
In these both scenarios will we get same distinct values?

SELECT * FROM (
SELECT distinct
 col1,
 col2,
 col3,
 col4,
 row_number () over (partition by col1, col2 ORDER BY col3) RN
FROM table)
WHERE rn = 1;


SELECT * FROM (
SELECT
 col1,
 col2,
 col3,
 col4,
 row_number () over (partition by col1, col2 ORDER BY col3) RN
FROM table) 
WHERE rn = 1;

答案1

得分: 0

窗口函数(这里使用ROW_NUMBER)在任何GROUP BY之后计算,但在DISTINCT之前计算。以下是我在我的数据库中选择的随机对象使用类似SQL的执行计划示例:

使用DISTINCT和RANK()函数一起在SELECT语句中用于唯一值是否正确?

执行顺序(从最缩进到最少缩进的顺序阅读)是:(1)扫描表,(2)计算ROW_NUMBER (WINDOW),(3)应用DISTINCT (HASH UNIQUE)

这意味着DISTINCT不会影响同一查询块中的ROW_NUMBER的结果。如果将其放在外部查询块中,它可能会改变外部查询块的结果,但在您的情况下,由于您的rn = 1谓词,它不会这样做。这将仅选择每个(col1,col2)的一行。虽然它必须经过另一个(不必要的)排序操作来满足请求,但没有其他事情可以让DISTINCT来做。

智慧之言:避免频繁使用DISTINCT。我经常看到开发人员一直在滥用它。在大多数情况下,当您希望降低粒度时,使用GROUP BY是更好的选择。通常,当我在开发人员的代码中看到DISTINCT时,它是试图隐藏查询中由于多对多问题而导致的重复行,而不是修复多对多本身。即使它不是在隐藏问题,当它没有任何目的时,对其他人来说也是令人困惑的。

英文:

The windowing function (ROW_NUMBER here) is computed after any GROUP BY but before DISTINCT. Here's what an execution plan looks like for a similar SQL using a random object I selected on one of my databases:

使用DISTINCT和RANK()函数一起在SELECT语句中用于唯一值是否正确?

Order of execution (read from most-indented upwards toward least-indented) is: (1) scan the table, (2) compute ROW_NUMBER (WINDOW), (3) apply DISTINCT (HASH UNIQUE).

This means that DISTINCT will not impact the result of ROW_NUMBER in the same query block. It could change the results of the outer query block if placed there, but in your case it won't because of your rn = 1 predicate. That will select only one row per (col1,col2). There's nothing left for DISTINCT to do, though it will have to pass through another (unnecessary) sort operation to honor the request.

Word of wisdom: avoid frequent use of DISTINCT. I see developers overusing it all the time. In most cases when you want to reduce granularity, GROUP BY is a better choice. Usually when I see DISTINCT in developers' code it's an attempt to hide duplicate rows caused by a many-to-many problem in their query, rather than fixing the many-to-many itself. And even if it's not hiding a problem, it is confusing to others when it's there for no purpose.

huangapple
  • 本文由 发表于 2023年3月7日 21:56:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/75662914.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定