如何提高我的SQL查询性能?

huangapple go评论61阅读模式
英文:

how to improve the performance of my SQL query?

问题

I am a beginner in SQL and I want to query data from an Oracle database using SQL. I have a tablespace that records the positions of many cars. Each record has a mileage and a time. The tablespace has three columns: ‘SENDTIME’, ‘MILEAGE’, and ‘PLATENO’. They represent the time, the mileage (in the timestamp format), and the plate number of the cars respectively. The values in the ‘SENDTIME’ column are in the datetime format. I want to find out when a specific car did not move for more than 30 seconds and less than 300 seconds. So i wrote a sql query:

WITH gpsinfo_cte AS (
  SELECT plateno, sendtime, longitude, latitude, mileage, createdate,
    FIRST_VALUE(sendtime) OVER (PARTITION BY plateno, mileage ORDER BY sendtime) AS first_sendtime,
    LAST_VALUE(sendtime) OVER (PARTITION BY plateno, mileage ORDER BY sendtime) AS last_sendtime
  FROM GPSINFO 
  WHERE plateno = '京AEW302'
)
SELECT /*+ NO_MERGE(gpsinfo_cte) */ plateno, sendtime, longitude, latitude, mileage, createdate 
FROM gpsinfo_cte
WHERE (last_sendtime - first_sendtime) * 24 * 60 *60 < 300
AND (last_sendtime - first_sendtime) * 24 * 60 *60 > 30;

However, it runs rather slowly in oracle database. Acoording to web search results, I tried to use the EXPLAIN PLAN statement to generate the execution plan for your query and store it in a table called PLAN_TABLE. Here's the output:

如何提高我的SQL查询性能?

I still have no idea to improve the performance. Anyone can help? Many thanks!

英文:

I am a beginner in SQL and I want to query data from an Oracle database using SQL. I have a tablespace that records the positions of many cars. Each record has a mileage and a time. The tablespace has three columns: ‘SENDTIME’, ‘MILEAGE’, and ‘PLATENO’. They represent the time, the mileage (in the timestamp format), and the plate number of the cars respectively. The values in the ‘SENDTIME’ column are in the datetime format. I want to find out when a specific car did not move for more than 30 seconds and less than 300 seconds. So i wrote a sql query:

WITH gpsinfo_cte AS (
  SELECT plateno, sendtime, longitude, latitude, mileage, createdate,
    FIRST_VALUE(sendtime) OVER (PARTITION BY plateno, mileage ORDER BY sendtime) AS first_sendtime,
    LAST_VALUE(sendtime) OVER (PARTITION BY plateno, mileage ORDER BY sendtime) AS last_sendtime
  FROM GPSINFO 
  WHERE plateno = '京AEW302'
)
SELECT /*+ NO_MERGE(gpsinfo_cte) */ plateno, sendtime, longitude, latitude, mileage, createdate 
FROM gpsinfo_cte
WHERE (last_sendtime - first_sendtime) * 24 * 60 *60 < 300
AND (last_sendtime - first_sendtime) * 24 * 60 *60 > 30;

However, it runs rather slowly in oracle database. Acoording to web search results, I tried to use the EXPLAIN PLAN statement to generate the execution plan for your query and store it in a table called PLAN_TABLE. Here's the output:

如何提高我的SQL查询性能?

I still have no idea to improve the performance. Anyone can help? Many thanks!

答案1

得分: 5

这可能是所有要紧的事情:

 FROM GPSINFO 
  WHERE plateno = '京AEW302'

假设你有很多辆车(可能是千百万辆),只请求一个车牌号,你想要使用索引来访问只包含该特定车辆的表行。你的执行计划显示这不是发生的情况:

TABLE ACCESS FULL GPSINFO

只需在plateno列上创建一个索引,这应该可以解决你的问题:

CREATE INDEX gpsinfo_plateno_idx1 on gpsinfo(plateno) compress 1

至于你查询的其余部分,你需要处理逻辑以获得正确的结果,但只要你只处理一辆车,通常不太可能出现性能问题。在逻辑方面,如果我理解正确,你想知道两个位置记录之间有30-300秒的间隔,并且它们之间没有里程。因此,你需要比较一行与相邻的一行。为此,使用LAG(或LEAD):

SELECT x.*,
       (sendtime - last_sentime) * 86400 seconds_elapsed
  FROM (SELECT x.*,
               LAG(sendtime) OVER (ORDER BY sendtime) last_sendtime,
               LAG(mileage) OVER (ORDER BY sendtime) last_mileage
          FROM gpsinfo x
         WHERE plateno = '京AEW302') x
 WHERE (sendtime - last_sentime) * 86400 BETWEEN 30 AND 300
   AND mileage = last_mileage 

注意:如果你只请求一个plateno值,就不需要在PARTITION BY子句中包含plateno,因为只有一个值。你当然可以使用PARTITION BY mileage作为我在这里展示的mileage = last_mileage逻辑的替代方法,但里程数可能有许多不同的值,内部根据这么多的值进行分组意味着许多小的(单行)分组,这不太内存/临时存储效率高。但正如我所说,索引确实是你唯一严重的问题。

英文:

This is probably all that matters:

 FROM GPSINFO 
  WHERE plateno = '京AEW302'

Assuming you have lots (thousands, millions maybe) of cars, and are asking for only one license plate number, you want to use an index to access only table rows for that particular car. Your execution plan shows that this isn't happening:

TABLE ACCESS FULL GPSINFO

Simply create an index on the plateno column and that should resolve your issue:

CREATE INDEX gpsinfo_plateno_idx1 on gpsinfo(plateno) compress 1

As for the rest of your query, you'll have to work on the logic to get the right results, but it is unlikely that you have a noticeable performance problem with it as long as you are working with only one car. In terms of the logic, if I understand rightly you want to know when there is a 30-300 second gap between position recordings with no mileage between them. So you need to compare a row with a neighboring row. For that, use LAG (or LEAD):

SELECT x.*,
       (sendtime - last_sentime) * 86400 seconds_elapsed
  FROM (SELECT x.*,
               LAG(sendtime) OVER (ORDER BY sendtime) last_sendtime,
               LAG(mileage) OVER (ORDER BY sendtime) last_mileage
          FROM gpsinfo x
         WHERE plateno = '京AEW302') x
 WHERE (sendtime - last_sentime) * 86400 BETWEEN 30 AND 300
   AND mileage = last_mileage 

Note: If you are asking for only one plateno value, there is no need to include plateno in a PARTITION BY clause, that's redundant as there is only one. You can certainly use PARTITION BY mileage as an alternative to my mileage = last_mileage logic shown here, but mileage is likely to have many different values and internally grouping by that many values means a lot of tiny (single row) groups and that's not very memory/temp efficient. Yet as I said, the indexing is really your only serious issue.

答案2

得分: -2

为了获得尽可能快的结果,您需要预先计算这些值(首次和最后发送时间)。可以通过以下方式实现:

  • 负责添加记录的例行程序 - 在一个plateno的上下文中,计算并保存首次和最后时间会更快
  • 在表上触发器
  • 材料化视图和用于刷新其数据的过程

所有这些都需要一些开发时间,不应该很困难。

我还可以建议您首先计算plateno,然后提取其详细信息。对于sendtime,我们对第一个(最小)和最后一个(最大)值感兴趣。因此,您可以使用以下查询:

WITH gpsinfo_cte AS (
  SELECT plateno
  FROM GPSINFO 
  WHERE plateno = '京AEW302'
  GROUP BY plateno
  HAVING (MAX(sendtime) - MIN(sendtime)) * 24 * 60 * 60 < 300
    AND (MAX(sendtime) - MIN(sendtime)) * 24 * 60 * 60 > 30
)
SELECT *
FROM GPSINFO
WHERE plateno IN (SELECT plateno FROM gpsinfo_cte);

如果这有效,请在plateno和sendtime上添加索引以进一步优化它。

英文:

In order to get fastest possible results you need to pre-calculated these values (first and last send time). This can be achieved with:

  • the routine responsible for adding the record - in the context of one plateno it will be faster to calculated and save the first and last time
  • trigger on the table
  • materilized view and process for refreshing its data

All required some dev time and can't be difficult.

I can also offer you to try to first calculate the plateno and then extract its details. In case of sendtime we are interested in the first (smallest) and last (largest) values. So, you can use:

WITH gpsinfo_cte AS (
  SELECT plateno
  FROM GPSINFO 
  WHERE plateno = &#39;京AEW302&#39;
  GROUP BY plateno
  HAVING (MAX(sendtime) - MIN(sendtime)) * 24 * 60 * 60 &lt; 300
    AND (MAX(sendtime) - MIN(sendtime)) * 24 * 60 * 60 &gt; 30
)
SELECT *
FROM GPSINFO
WHERE plateno IN (SELECT plateno FROM gpsinfo_cte);

If this is working, you can add index on plateno and sendtime in order to further optimize it.

huangapple
  • 本文由 发表于 2023年5月29日 15:51:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76355546.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定