2023年7月23日 18:35:27go评论90阅读模式

英文:

Please explain the execution of this simple SQL query

问题

以下是翻译好的部分：

DDL和DML用于创建此表：
创建表Delivery（如果不存在）（delivery_id int，customer_id int，order_date date，customer_pref_delivery_date date）;
截断表Delivery;
插入到Delivery（delivery_id，customer_id，order_date，customer_pref_delivery_date）中的值（'1'，'1'，'2019-08-01'，'2019-08-02'）;
插入到Delivery（delivery_id，customer_id，order_date，customer_pref_delivery_date）中的值（'2'，'2'，'2019-08-02'，'2019-08-02'）;
插入到Delivery（delivery_id，customer_id，order_date，customer_pref_delivery_date）中的值（'3'，'1'，'2019-08-11'，'2019-08-12'）;
插入到Delivery（delivery_id，customer_id，order_date，customer_pref_delivery_date）中的值（'4'，'3'，'2019-08-24'，'2019-08-24'）;
插入到Delivery（delivery_id，customer_id，order_date，customer_pref_delivery_date）中的值（'5'，'3'，'2019-08-21'，'2019-08-22'）;
插入到Delivery（delivery_id，customer_id，order_date，customer_pref_delivery_date）中的值（'6'，'2'，'2019-08-11'，'2019-08-13'）;
插入到Delivery（delivery_id，customer_id，order_date，customer_pref_delivery_date）中的值（'7'，'4'，'2019-08-09'，'2019-08-09'）;
我想按customer_id升序重新排列此表的行，并对相同customer_id的多行按order_date升序排列。
为了实现这一点，我正在编写以下查询：
with t1 as（select *
            from delivery
            order by customer_id, order_date），
t2 as（select * from t1 group by customer_id）
select * from t2;

在第三行，我期望的是2019-08-21而不是2019-08-24。请解释在此解决方案中我做错了什么？请解释为什么重新排列不按照指定的CTE进行？

英文:

DDL & DMLs for creating this table:

  Create table If Not Exists Delivery (delivery_id int, customer_id int, order_date date, customer_pref_delivery_date date);
Truncate table Delivery;
insert into Delivery (delivery_id, customer_id, order_date, customer_pref_delivery_date) values (&#39;1&#39;, &#39;1&#39;, &#39;2019-08-01&#39;, &#39;2019-08-02&#39;);
insert into Delivery (delivery_id, customer_id, order_date, customer_pref_delivery_date) values (&#39;2&#39;, &#39;2&#39;, &#39;2019-08-02&#39;, &#39;2019-08-02&#39;);
insert into Delivery (delivery_id, customer_id, order_date, customer_pref_delivery_date) values (&#39;3&#39;, &#39;1&#39;, &#39;2019-08-11&#39;, &#39;2019-08-12&#39;);
insert into Delivery (delivery_id, customer_id, order_date, customer_pref_delivery_date) values (&#39;4&#39;, &#39;3&#39;, &#39;2019-08-24&#39;, &#39;2019-08-24&#39;);
insert into Delivery (delivery_id, customer_id, order_date, customer_pref_delivery_date) values (&#39;5&#39;, &#39;3&#39;, &#39;2019-08-21&#39;, &#39;2019-08-22&#39;);
insert into Delivery (delivery_id, customer_id, order_date, customer_pref_delivery_date) values (&#39;6&#39;, &#39;2&#39;, &#39;2019-08-11&#39;, &#39;2019-08-13&#39;);
insert into Delivery (delivery_id, customer_id, order_date, customer_pref_delivery_date) values (&#39;7&#39;, &#39;4&#39;, &#39;2019-08-09&#39;, &#39;2019-08-09&#39;);

I want to rearrange the rows of this table in ascending order of customer_id and for multiple rows of same customer_id ascending order of order_date.
For getting that I was writing query as:

with t1 as (select *
from delivery
order by customer_id, order_date),
t2 as (select * from t1 group by customer_id)
select * from t2;

and I was getting this table:

Here in third row instead of 2019-08-24 I was expecting 2019-08-21. Please explain what am I doing wrong in THIS SOLUTION? Please explain why rearrangement is not happening as per the CTE specified?

答案1

得分: 1

Optimizing Derived Tables, View References, and Common Table Expressions with Merging or Materialization states:

> 优化器在派生表或视图引用中传播ORDER BY子句到外部查询块，如果以下条件都为真：

>
> * 外部查询没有分组或聚合。
> * 外部查询没有指定DISTINCT、HAVING或ORDER BY。
> * 外部查询在FROM子句中只有这个派生表或视图引用作为唯一的数据源。

> 否则，优化器会忽略ORDER BY子句。

因为在t2中t1被分组，所以优化器会忽略ORDER BY子句。

你的查询和SickerDude43提出的查询都是nondeterministic的。它们之所以能返回结果，是因为ONLY_FULL_GROUP_BY被禁用。所选的所有列（和表达式）应该要么在一个聚合函数中，要么在GROUP BY子句中有函数依赖。

你应该阅读MySQL对GROUP BY的处理和ONLY_FULL_GROUP_BY。

以下是greatest-n-per-group的一些典型解决方案：

-- 使用聚合子查询的每组最大（或最小）值
-- 如果min_order_date上有多个订单，则会为customer_id返回多行
SELECT d.*
FROM (
    SELECT customer_id, MIN(order_date) AS min_order_date
    FROM delivery
    GROUP BY customer_id
) d_min
JOIN delivery d
    ON d_min.customer_id = d.customer_id
    AND d_min.min_order_date = d.order_date;

-- 使用相关子查询的每组最大（或最小）值
SELECT d1.*
FROM delivery d1
WHERE delivery_id = (
    SELECT delivery_id
    FROM delivery d2
    WHERE d1.customer_id = d2.customer_id
    ORDER BY customer_id, order_date, delivery_id
    LIMIT 1
);

-- 使用ROW_NUMBER()窗口函数的每组最大（或最小）值（MySQL >= 8.0）
SELECT *
FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date, delivery_id) AS rn
    FROM delivery
) d
WHERE rn = 1;

聚合和相关子查询将受益于(customer_id, order_date)上的索引。

哪个查询速度最快将取决于您的数据分布。可以尝试它们。

这里有一个db<>fiddle。

英文:

Optimizing Derived Tables, View References, and Common Table Expressions with Merging or Materialization states:

> The optimizer propagates an ORDER BY clause in a derived table or view reference to the outer query block if these conditions are all true:
>
> * The outer query is not grouped or aggregated.
> * The outer query does not specify DISTINCT, HAVING, or ORDER BY.
> * The outer query has this derived table or view reference as the only source in the FROM clause.
>
> Otherwise, the optimizer ignores the ORDER BY clause.

Because t1 is then being grouped in t2, the optimizer ignores the ORDER BY clause.

Your query, and that proposed by SickerDude43, are both nondeterministic. They only return at all because ONLY_FULL_GROUP_BY is disabled. All selected columns (and expressions) should either be in an aggregate function or functionally dependant on the GROUP BY clause.

You should read MySQL Handling of GROUP BY and ONLY_FULL_GROUP_BY.

Here are some typical solutions for greatest-n-per-group:

-- greatest (or least) per group using aggregate subquery
-- this will return multiple rows for a customer_id if there are multiple orders on min_order_date
SELECT d.*
FROM (
    SELECT customer_id, MIN(order_date) AS min_order_date
    FROM delivery
    GROUP BY customer_id
) d_min
JOIN delivery d
    ON d_min.customer_id = d.customer_id
    AND d_min.min_order_date = d.order_date;

-- greatest (or least) per group using correlated subquery
SELECT d1.*
FROM delivery d1
WHERE delivery_id = (
    SELECT delivery_id
    FROM delivery d2
    WHERE d1.customer_id = d2.customer_id
    ORDER BY customer_id, order_date, delivery_id
    LIMIT 1
);

-- greatest (or least) per group using ROW_NUMBER() window function (MySQL &gt;= 8.0)
SELECT *
FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date, delivery_id) AS rn
    FROM delivery
) d
WHERE rn = 1;

The aggregate and correlated subqueries will benefit from an index on (customer_id, order_date).

Which one is fastest will depend on the distribution of your data. Give them a try.

Here's a db<>fiddle.

答案2

得分: 0

你的查询过于复杂。

只需使用MIN()函数选择订单日期的最小值。

select delivery_id, customer_id, MIN(order_date), 
customer_pref_delivery_date from delivery
group by customer_id

这应该可以正常工作。

Fiddle链接：http://sqlfiddle.com/#!9/4a8a51/1

你的解决方案为什么不起作用，在评论中有解释。

英文:

Your query is unnecessary complicated.

Just select the lowest value for the order date with the MIN() function.

select delivery_id, customer_id, MIN(order_date), 
customer_pref_delivery_date from delivery
group by customer_id

This should work.

Fiddle: http://sqlfiddle.com/#!9/4a8a51/1

The reason why your solution doesn't work is explained in the comments.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

请解释这个简单的SQL查询的执行。

问题

答案1

答案2

使用MySQL数据库进行Revel开发，但不使用GORM。

是不是可以将MyISAM表文件的路径更改为不同的挂载点？

char_length()和character_length()之间有什么区别？

SQL：选择TEXT字段的子字符串是否比整个值更快

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。