2023年3月7日 21:56:41go评论71阅读模式

英文:

Is it right to use DISTINCT and RANK() function together for distinct values in a select statement?

问题

在一个SELECT语句中，我们可以同时使用DISTINCT和RANK()函数来获取不同的值吗？让我们考虑两种情况。

使用RANK()和DISTINCT
仅使用RANK()。
在这两种情况下，我们是否会获得相同的不同值？

SELECT * FROM (
SELECT DISTINCT
 col1,
 col2,
 col3,
 col4,
 row_number() OVER (PARTITION BY col1, col2 ORDER BY col3) RN
FROM table)
WHERE RN = 1;

SELECT * FROM (
SELECT
 col1,
 col2,
 col3,
 col4,
 row_number() OVER (PARTITION BY col1, col2 ORDER BY col3) RN
FROM table) 
WHERE RN = 1;

英文:

Can we use DISTINCT and RANK() function together in a select statement for distinct values? Lets take 2 scenarios
1.Using RANK() and DISTINCT
2.Using only RANK() .
In these both scenarios will we get same distinct values?

SELECT * FROM (
SELECT distinct
 col1,
 col2,
 col3,
 col4,
 row_number () over (partition by col1, col2 ORDER BY col3) RN
FROM table)
WHERE rn = 1;


SELECT * FROM (
SELECT
 col1,
 col2,
 col3,
 col4,
 row_number () over (partition by col1, col2 ORDER BY col3) RN
FROM table) 
WHERE rn = 1;

答案1

得分: 0

窗口函数（这里使用ROW_NUMBER）在任何GROUP BY之后计算，但在DISTINCT之前计算。以下是我在我的数据库中选择的随机对象使用类似SQL的执行计划示例：

执行顺序（从最缩进到最少缩进的顺序阅读）是：（1）扫描表，（2）计算ROW_NUMBER (WINDOW)，（3）应用DISTINCT (HASH UNIQUE)。

这意味着DISTINCT不会影响同一查询块中的ROW_NUMBER的结果。如果将其放在外部查询块中，它可能会改变外部查询块的结果，但在您的情况下，由于您的rn = 1谓词，它不会这样做。这将仅选择每个（col1，col2）的一行。虽然它必须经过另一个（不必要的）排序操作来满足请求，但没有其他事情可以让DISTINCT来做。

智慧之言：避免频繁使用DISTINCT。我经常看到开发人员一直在滥用它。在大多数情况下，当您希望降低粒度时，使用GROUP BY是更好的选择。通常，当我在开发人员的代码中看到DISTINCT时，它是试图隐藏查询中由于多对多问题而导致的重复行，而不是修复多对多本身。即使它不是在隐藏问题，当它没有任何目的时，对其他人来说也是令人困惑的。

英文:

The windowing function (ROW_NUMBER here) is computed after any GROUP BY but before DISTINCT. Here's what an execution plan looks like for a similar SQL using a random object I selected on one of my databases:

Order of execution (read from most-indented upwards toward least-indented) is: (1) scan the table, (2) compute ROW_NUMBER (WINDOW), (3) apply DISTINCT (HASH UNIQUE).

This means that DISTINCT will not impact the result of ROW_NUMBER in the same query block. It could change the results of the outer query block if placed there, but in your case it won't because of your rn = 1 predicate. That will select only one row per (col1,col2). There's nothing left for DISTINCT to do, though it will have to pass through another (unnecessary) sort operation to honor the request.

Word of wisdom: avoid frequent use of DISTINCT. I see developers overusing it all the time. In most cases when you want to reduce granularity, GROUP BY is a better choice. Usually when I see DISTINCT in developers' code it's an attempt to hide duplicate rows caused by a many-to-many problem in their query, rather than fixing the many-to-many itself. And even if it's not hiding a problem, it is confusing to others when it's there for no purpose.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用DISTINCT和RANK()函数一起在SELECT语句中用于唯一值是否正确？

问题

答案1

Oracle SQL命令用于HAVING COUNT。

如何删除/更新 JSONB 数组中键等于某个值的元素？

EXECUTE IMMEDIATE在BigQuery中与INSERT一起使用时不起作用。

创建外键时出现语法错误

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论