英文:
Is it right to use DISTINCT and RANK() function together for distinct values in a select statement?
问题
在一个SELECT语句中,我们可以同时使用DISTINCT和RANK()函数来获取不同的值吗?让我们考虑两种情况。
- 使用RANK()和DISTINCT
- 仅使用RANK()。
在这两种情况下,我们是否会获得相同的不同值?
SELECT * FROM (
SELECT DISTINCT
col1,
col2,
col3,
col4,
row_number() OVER (PARTITION BY col1, col2 ORDER BY col3) RN
FROM table)
WHERE RN = 1;
SELECT * FROM (
SELECT
col1,
col2,
col3,
col4,
row_number() OVER (PARTITION BY col1, col2 ORDER BY col3) RN
FROM table)
WHERE RN = 1;
英文:
Can we use DISTINCT and RANK() function together in a select statement for distinct values? Lets take 2 scenarios
1.Using RANK() and DISTINCT
2.Using only RANK() .
In these both scenarios will we get same distinct values?
SELECT * FROM (
SELECT distinct
col1,
col2,
col3,
col4,
row_number () over (partition by col1, col2 ORDER BY col3) RN
FROM table)
WHERE rn = 1;
SELECT * FROM (
SELECT
col1,
col2,
col3,
col4,
row_number () over (partition by col1, col2 ORDER BY col3) RN
FROM table)
WHERE rn = 1;
答案1
得分: 0
窗口函数(这里使用ROW_NUMBER
)在任何GROUP BY
之后计算,但在DISTINCT
之前计算。以下是我在我的数据库中选择的随机对象使用类似SQL的执行计划示例:
执行顺序(从最缩进到最少缩进的顺序阅读)是:(1)扫描表,(2)计算ROW_NUMBER (WINDOW)
,(3)应用DISTINCT (HASH UNIQUE)
。
这意味着DISTINCT
不会影响同一查询块中的ROW_NUMBER
的结果。如果将其放在外部查询块中,它可能会改变外部查询块的结果,但在您的情况下,由于您的rn = 1
谓词,它不会这样做。这将仅选择每个(col1,col2)的一行。虽然它必须经过另一个(不必要的)排序操作来满足请求,但没有其他事情可以让DISTINCT
来做。
智慧之言:避免频繁使用DISTINCT
。我经常看到开发人员一直在滥用它。在大多数情况下,当您希望降低粒度时,使用GROUP BY
是更好的选择。通常,当我在开发人员的代码中看到DISTINCT
时,它是试图隐藏查询中由于多对多问题而导致的重复行,而不是修复多对多本身。即使它不是在隐藏问题,当它没有任何目的时,对其他人来说也是令人困惑的。
英文:
The windowing function (ROW_NUMBER
here) is computed after any GROUP BY
but before DISTINCT
. Here's what an execution plan looks like for a similar SQL using a random object I selected on one of my databases:
Order of execution (read from most-indented upwards toward least-indented) is: (1) scan the table, (2) compute ROW_NUMBER (WINDOW)
, (3) apply DISTINCT (HASH UNIQUE)
.
This means that DISTINCT
will not impact the result of ROW_NUMBER
in the same query block. It could change the results of the outer query block if placed there, but in your case it won't because of your rn = 1
predicate. That will select only one row per (col1,col2)
. There's nothing left for DISTINCT
to do, though it will have to pass through another (unnecessary) sort operation to honor the request.
Word of wisdom: avoid frequent use of DISTINCT
. I see developers overusing it all the time. In most cases when you want to reduce granularity, GROUP BY
is a better choice. Usually when I see DISTINCT
in developers' code it's an attempt to hide duplicate rows caused by a many-to-many problem in their query, rather than fixing the many-to-many itself. And even if it's not hiding a problem, it is confusing to others when it's there for no purpose.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论