2023年6月6日 03:56:02go评论74阅读模式

英文:

How to optimize this sql query on + 2 million rows

问题

我有一个包含200多万行数据并且在快速增长的SQL数据库。列不多，只有code, price, date和stationID。

目标是按code和stationID获取最新的价格。
查询功能正常，但需要超过10秒。

有没有优化查询的方法？

$statement = $this->pdo->prepare(
    'WITH cte AS 
    (
        SELECT stationID AS ind, code, CAST(price AS DOUBLE) AS price, date,
        ROW_NUMBER() OVER (
            PARTITION BY code, stationID
            ORDER BY date DESC
        ) AS latest
        FROM price
    )
    SELECT *
    FROM cte
    WHERE latest = 1'
);

$statement->execute();
$results = $statement->fetchAll(PDO::FETCH_GROUP | PDO::FETCH_ASSOC);

编辑：
第一列上有一个名为'id'的索引。我不知道这是否有帮助。

数据库（InnoDB）如下所示：

id primary - int
stationID - int
code - int
price - decimal(10,5)
date - datetime

编辑2：
结果需要按stationID分组，每个stationID需要显示多行。每个code都显示最新的日期，如下所示：

22456:
    code: 1
    price: 3
    date: 2023-06-21

    code: 2
    price: 2
    date: 2023-06-21

    code: 3
    price: 5
    date: 2023-06-21

22457:
    code: 1
    price: 10
    date: 2023-06-21

    code: 2
    price: 1
    date: 2023-06-21

    code: 3
    price: 33
    date: 2023-06-21

JSON输出应如下所示：

{"1000001":[{"code":1,"price":1.661,"date":"2023-06-06 12:46:32","latest":1},{"code":2,"price":1.867,"date":"2023-06-06 12:46:32","latest":1},{"code":3,"price":1.05,"date":"2023-06-06 12:46:32","latest":1},{"code":5,"price":1.818,"date":"2023-06-06 12:46:32","latest":1},{"code":6,"price":1.879,"date":"2023-06-06 12:46:32","latest":1}],
"1000002":[{"code":1,"price":1.65,"date":"2023-06-03 08:53:26","latest":1},{"code":2,"price":1.868,"date":"2023-06-03 08:53:26","latest":1},{"code":6,"price":1.889,"date":"2023-06-03 08:53:27","latest":1}],...}

英文:

I have a Sql database with +2 million rows and growing fast. There aren't many columns, only code, price, date and stationID.

The aim is to get the latest price by code and stationID.
The query works well, but takes more than 10s.

Is there a way to optimize the query?

 $statement = $this-&gt;pdo-&gt;prepare(

        &#39;WITH cte AS 
        (

        SELECT stationID AS ind, code, CAST(price AS DOUBLE ) AS price, date

        ,ROW_NUMBER() OVER(
             PARTITION BY code, stationID
             ORDER BY date DESC
         ) AS latest

        FROM price
        )
        
        SELECT *
        FROM cte
        WHERE latest  = 1
        &#39;
    );

    $statement-&gt;execute();
    $results = $statement-&gt;fetchAll(PDO::FETCH_GROUP | PDO::FETCH_ASSOC);

Edit:
There is an index on the first column called 'id'. I don't know if this helps.

The database (InnoDB) look like this:

id primary - int
stationID - int
code - int
price - decimal(10,5)
date - datetime

Edit 2:

The result need to be grouped by stationID and each stationID need to show many line. One line for every code with latest date.
Like this:

22456:
      code:  1
      price: 3
      date:  2023-06-21

      code:  2
      price: 2
      date:  2023-06-21

      code:  3
      price: 5
      date:  2023-06-21

22457:
      code:  1
      price: 10
      date:  2023-06-21

      code:  2
      price: 1
      date:  2023-06-21

      code:  3
      price: 33
      date:  2023-06-21

The json output should be like this:

{&quot;1000001&quot;:[{&quot;code&quot;:1,&quot;price&quot;:1.661,&quot;date&quot;:&quot;2023-06-06 12:46:32&quot;,&quot;latest&quot;:1},{&quot;code&quot;:2,&quot;price&quot;:1.867,&quot;date&quot;:&quot;2023-06-06 12:46:32&quot;,&quot;latest&quot;:1},{&quot;code&quot;:3,&quot;price&quot;:1.05,&quot;date&quot;:&quot;2023-06-06 12:46:32&quot;,&quot;latest&quot;:1},{&quot;code&quot;:5,&quot;price&quot;:1.818,&quot;date&quot;:&quot;2023-06-06 12:46:32&quot;,&quot;latest&quot;:1},{&quot;code&quot;:6,&quot;price&quot;:1.879,&quot;date&quot;:&quot;2023-06-06 12:46:32&quot;,&quot;latest&quot;:1}],&quot;1000002&quot;:[{&quot;code&quot;:1,&quot;price&quot;:1.65,&quot;date&quot;:&quot;2023-06-03 08:53:26&quot;,&quot;latest&quot;:1},{&quot;code&quot;:2,&quot;price&quot;:1.868,&quot;date&quot;:&quot;2023-06-03 08:53:26&quot;,&quot;latest&quot;:1},{&quot;code&quot;:6,&quot;price&quot;:1.889,&quot;date&quot;:&quot;2023-06-03 08:53:27&quot;,&quot;latest&quot;:1}],…

答案1

得分: 1

只要在相同的 code, stationID 对中不能有相同日期的两行，使用窗口函数就有点像用大锤来砸坚果。

select p.stationID, p.code, p.price, p.date
from (
    select code, stationID, max(date) as max_date
    from price
    group by code, stationID
) max
join price p
    on max.code = p.code
   and max.stationID = p.stationID
   and max.max_date = p.date;

需要以下索引：

alter table price add index (code, stationID, date desc);

此查询应该在不到1毫秒内完成，因为派生表可以从索引中构建，然后只需从表中读取所需的行。

或者，如果您知道每个 code, stationID 对都会在特定时间段内（1小时、1天、1周）收到更新的价格，那么您可以通过向窗口函数添加一个 where 子句来显著减少窗口函数所需的工作量：

with cte as 
(
    select stationID as ind, code, price, date, row_number() over(partition by code, stationID order by date desc) as latest
    from price
    where date >= now() - interval 1 week
)
select * from cte where latest  = 1;

英文:

As long as you cannot have two rows with the same datetime for the same code, stationID pair, using the window function is a bit like using a sledgehammer to crack a nut.

select p.stationID, p.code, p.price, p.date
from (
    select code, stationID, max(date) as max_date
    from price
    group by code, stationID
) max
join price p
    on max.code = p.code
   and max.stationID = p.stationID
   and max.max_date = p.date;

It requires the following index:

alter table price add index (code, stationID, date desc);

This query should take less than 1 ms, as the derived table can be built from the index, and then it is reading just the required rows from the table.

Alternatively, if you know that every code, stationID pair will have received an updated price within a specific time period (1 hr, 1 day, 1 week), then you could significantly reduce the work required by the window function by adding a where clause to it:

with cte as 
(
    select stationID as ind, code, price, date, row_number() over(partition by code, stationID order by date desc) as latest
    from price
    where date &gt;= now() - interval 1 week
)
select * from cte where latest  = 1;

答案2

得分: 0

我会想象你需要以下索引，以使你的查询性能良好（你只需要在数据库设计的一部分中执行一次）。

CREATE INDEX IX ON price
  (code, stationID, date DESC, price)

前两列可以以任何顺序排列。

英文:

I would imagine you need the following index for your query to perform well (you only need to do this once, as part of your database design).

CREATE INDEX IX ON price
  (code, stationID, date DESC, price)

The first two columns can be in either order.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何优化这个SQL查询，涉及200万行数据。

问题

答案1

答案2

如何在使用 Workbench 时添加 mysqldump 配置标志？

安排 Hibernate 在特定时间保存，直到另行通知才同步。

How to fix the used key part isn't a string, doesn't support unique prefix keys from prisma migrtion?

如何将状态从数字转换为文本

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论