2023年5月29日 15:51:18go评论95阅读模式

英文:

how to improve the performance of my SQL query?

问题

I am a beginner in SQL and I want to query data from an Oracle database using SQL. I have a tablespace that records the positions of many cars. Each record has a mileage and a time. The tablespace has three columns: ‘SENDTIME’, ‘MILEAGE’, and ‘PLATENO’. They represent the time, the mileage (in the timestamp format), and the plate number of the cars respectively. The values in the ‘SENDTIME’ column are in the datetime format. I want to find out when a specific car did not move for more than 30 seconds and less than 300 seconds. So i wrote a sql query:

WITH gpsinfo_cte AS (
  SELECT plateno, sendtime, longitude, latitude, mileage, createdate,
    FIRST_VALUE(sendtime) OVER (PARTITION BY plateno, mileage ORDER BY sendtime) AS first_sendtime,
    LAST_VALUE(sendtime) OVER (PARTITION BY plateno, mileage ORDER BY sendtime) AS last_sendtime
  FROM GPSINFO 
  WHERE plateno = &#39;京AEW302&#39;
)
SELECT /*+ NO_MERGE(gpsinfo_cte) */ plateno, sendtime, longitude, latitude, mileage, createdate 
FROM gpsinfo_cte
WHERE (last_sendtime - first_sendtime) * 24 * 60 *60 &lt; 300
AND (last_sendtime - first_sendtime) * 24 * 60 *60 &gt; 30;

However, it runs rather slowly in oracle database. Acoording to web search results, I tried to use the EXPLAIN PLAN statement to generate the execution plan for your query and store it in a table called PLAN_TABLE. Here's the output:

I still have no idea to improve the performance. Anyone can help? Many thanks!

英文:

WITH gpsinfo_cte AS (
  SELECT plateno, sendtime, longitude, latitude, mileage, createdate,
    FIRST_VALUE(sendtime) OVER (PARTITION BY plateno, mileage ORDER BY sendtime) AS first_sendtime,
    LAST_VALUE(sendtime) OVER (PARTITION BY plateno, mileage ORDER BY sendtime) AS last_sendtime
  FROM GPSINFO 
  WHERE plateno = &#39;京AEW302&#39;
)
SELECT /*+ NO_MERGE(gpsinfo_cte) */ plateno, sendtime, longitude, latitude, mileage, createdate 
FROM gpsinfo_cte
WHERE (last_sendtime - first_sendtime) * 24 * 60 *60 &lt; 300
AND (last_sendtime - first_sendtime) * 24 * 60 *60 &gt; 30;

I still have no idea to improve the performance. Anyone can help? Many thanks!

答案1

得分: 5

这可能是所有要紧的事情：

 FROM GPSINFO 
  WHERE plateno = '京AEW302'

假设你有很多辆车（可能是千百万辆），只请求一个车牌号，你想要使用索引来访问只包含该特定车辆的表行。你的执行计划显示这不是发生的情况：

TABLE ACCESS FULL GPSINFO

只需在plateno列上创建一个索引，这应该可以解决你的问题：

CREATE INDEX gpsinfo_plateno_idx1 on gpsinfo(plateno) compress 1

至于你查询的其余部分，你需要处理逻辑以获得正确的结果，但只要你只处理一辆车，通常不太可能出现性能问题。在逻辑方面，如果我理解正确，你想知道两个位置记录之间有30-300秒的间隔，并且它们之间没有里程。因此，你需要比较一行与相邻的一行。为此，使用LAG（或LEAD）：

SELECT x.*,
       (sendtime - last_sentime) * 86400 seconds_elapsed
  FROM (SELECT x.*,
               LAG(sendtime) OVER (ORDER BY sendtime) last_sendtime,
               LAG(mileage) OVER (ORDER BY sendtime) last_mileage
          FROM gpsinfo x
         WHERE plateno = '京AEW302') x
 WHERE (sendtime - last_sentime) * 86400 BETWEEN 30 AND 300
   AND mileage = last_mileage

注意：如果你只请求一个plateno值，就不需要在PARTITION BY子句中包含plateno，因为只有一个值。你当然可以使用PARTITION BY mileage作为我在这里展示的mileage = last_mileage逻辑的替代方法，但里程数可能有许多不同的值，内部根据这么多的值进行分组意味着许多小的（单行）分组，这不太内存/临时存储效率高。但正如我所说，索引确实是你唯一严重的问题。

英文:

This is probably all that matters:

 FROM GPSINFO 
  WHERE plateno = &#39;京AEW302&#39;

Assuming you have lots (thousands, millions maybe) of cars, and are asking for only one license plate number, you want to use an index to access only table rows for that particular car. Your execution plan shows that this isn't happening:

TABLE ACCESS FULL GPSINFO

Simply create an index on the plateno column and that should resolve your issue:

CREATE INDEX gpsinfo_plateno_idx1 on gpsinfo(plateno) compress 1

As for the rest of your query, you'll have to work on the logic to get the right results, but it is unlikely that you have a noticeable performance problem with it as long as you are working with only one car. In terms of the logic, if I understand rightly you want to know when there is a 30-300 second gap between position recordings with no mileage between them. So you need to compare a row with a neighboring row. For that, use LAG (or LEAD):

SELECT x.*,
       (sendtime - last_sentime) * 86400 seconds_elapsed
  FROM (SELECT x.*,
               LAG(sendtime) OVER (ORDER BY sendtime) last_sendtime,
               LAG(mileage) OVER (ORDER BY sendtime) last_mileage
          FROM gpsinfo x
         WHERE plateno = &#39;京AEW302&#39;) x
 WHERE (sendtime - last_sentime) * 86400 BETWEEN 30 AND 300
   AND mileage = last_mileage

Note: If you are asking for only one plateno value, there is no need to include plateno in a PARTITION BY clause, that's redundant as there is only one. You can certainly use PARTITION BY mileage as an alternative to my mileage = last_mileage logic shown here, but mileage is likely to have many different values and internally grouping by that many values means a lot of tiny (single row) groups and that's not very memory/temp efficient. Yet as I said, the indexing is really your only serious issue.

答案2

得分: -2

为了获得尽可能快的结果，您需要预先计算这些值（首次和最后发送时间）。可以通过以下方式实现：

负责添加记录的例行程序 - 在一个plateno的上下文中，计算并保存首次和最后时间会更快
在表上触发器
材料化视图和用于刷新其数据的过程

所有这些都需要一些开发时间，不应该很困难。

我还可以建议您首先计算plateno，然后提取其详细信息。对于sendtime，我们对第一个（最小）和最后一个（最大）值感兴趣。因此，您可以使用以下查询：

WITH gpsinfo_cte AS (
  SELECT plateno
  FROM GPSINFO 
  WHERE plateno = '京AEW302'
  GROUP BY plateno
  HAVING (MAX(sendtime) - MIN(sendtime)) * 24 * 60 * 60 < 300
    AND (MAX(sendtime) - MIN(sendtime)) * 24 * 60 * 60 > 30
)
SELECT *
FROM GPSINFO
WHERE plateno IN (SELECT plateno FROM gpsinfo_cte);

如果这有效，请在plateno和sendtime上添加索引以进一步优化它。

英文:

In order to get fastest possible results you need to pre-calculated these values (first and last send time). This can be achieved with:

the routine responsible for adding the record - in the context of one plateno it will be faster to calculated and save the first and last time
trigger on the table
materilized view and process for refreshing its data

All required some dev time and can't be difficult.

I can also offer you to try to first calculate the plateno and then extract its details. In case of sendtime we are interested in the first (smallest) and last (largest) values. So, you can use:

WITH gpsinfo_cte AS (
  SELECT plateno
  FROM GPSINFO 
  WHERE plateno = &#39;京AEW302&#39;
  GROUP BY plateno
  HAVING (MAX(sendtime) - MIN(sendtime)) * 24 * 60 * 60 &lt; 300
    AND (MAX(sendtime) - MIN(sendtime)) * 24 * 60 * 60 &gt; 30
)
SELECT *
FROM GPSINFO
WHERE plateno IN (SELECT plateno FROM gpsinfo_cte);

If this is working, you can add index on plateno and sendtime in order to further optimize it.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何提高我的SQL查询性能？

问题

答案1

答案2

从具有不同合作伙伴和购买天数的表中获取平均周期。

Presto/Trino/Athena – 将varchar强制转换为double时的错误减法

SQL：如何选择具有最年轻日期或时间戳的行

SQL需要帮助缩短查询以迭代列。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。