英文:
how to improve the performance of my SQL query?
问题
I am a beginner in SQL and I want to query data from an Oracle database using SQL. I have a tablespace that records the positions of many cars. Each record has a mileage and a time. The tablespace has three columns: ‘SENDTIME’, ‘MILEAGE’, and ‘PLATENO’. They represent the time, the mileage (in the timestamp format), and the plate number of the cars respectively. The values in the ‘SENDTIME’ column are in the datetime format. I want to find out when a specific car did not move for more than 30 seconds and less than 300 seconds. So i wrote a sql query:
WITH gpsinfo_cte AS (
SELECT plateno, sendtime, longitude, latitude, mileage, createdate,
FIRST_VALUE(sendtime) OVER (PARTITION BY plateno, mileage ORDER BY sendtime) AS first_sendtime,
LAST_VALUE(sendtime) OVER (PARTITION BY plateno, mileage ORDER BY sendtime) AS last_sendtime
FROM GPSINFO
WHERE plateno = '京AEW302'
)
SELECT /*+ NO_MERGE(gpsinfo_cte) */ plateno, sendtime, longitude, latitude, mileage, createdate
FROM gpsinfo_cte
WHERE (last_sendtime - first_sendtime) * 24 * 60 *60 < 300
AND (last_sendtime - first_sendtime) * 24 * 60 *60 > 30;
However, it runs rather slowly in oracle database. Acoording to web search results, I tried to use the EXPLAIN PLAN statement to generate the execution plan for your query and store it in a table called PLAN_TABLE. Here's the output:
I still have no idea to improve the performance. Anyone can help? Many thanks!
英文:
I am a beginner in SQL and I want to query data from an Oracle database using SQL. I have a tablespace that records the positions of many cars. Each record has a mileage and a time. The tablespace has three columns: ‘SENDTIME’, ‘MILEAGE’, and ‘PLATENO’. They represent the time, the mileage (in the timestamp format), and the plate number of the cars respectively. The values in the ‘SENDTIME’ column are in the datetime format. I want to find out when a specific car did not move for more than 30 seconds and less than 300 seconds. So i wrote a sql query:
WITH gpsinfo_cte AS (
SELECT plateno, sendtime, longitude, latitude, mileage, createdate,
FIRST_VALUE(sendtime) OVER (PARTITION BY plateno, mileage ORDER BY sendtime) AS first_sendtime,
LAST_VALUE(sendtime) OVER (PARTITION BY plateno, mileage ORDER BY sendtime) AS last_sendtime
FROM GPSINFO
WHERE plateno = '京AEW302'
)
SELECT /*+ NO_MERGE(gpsinfo_cte) */ plateno, sendtime, longitude, latitude, mileage, createdate
FROM gpsinfo_cte
WHERE (last_sendtime - first_sendtime) * 24 * 60 *60 < 300
AND (last_sendtime - first_sendtime) * 24 * 60 *60 > 30;
However, it runs rather slowly in oracle database. Acoording to web search results, I tried to use the EXPLAIN PLAN statement to generate the execution plan for your query and store it in a table called PLAN_TABLE. Here's the output:
I still have no idea to improve the performance. Anyone can help? Many thanks!
答案1
得分: 5
这可能是所有要紧的事情:
FROM GPSINFO
WHERE plateno = '京AEW302'
假设你有很多辆车(可能是千百万辆),只请求一个车牌号,你想要使用索引来访问只包含该特定车辆的表行。你的执行计划显示这不是发生的情况:
TABLE ACCESS FULL GPSINFO
只需在plateno列上创建一个索引,这应该可以解决你的问题:
CREATE INDEX gpsinfo_plateno_idx1 on gpsinfo(plateno) compress 1
至于你查询的其余部分,你需要处理逻辑以获得正确的结果,但只要你只处理一辆车,通常不太可能出现性能问题。在逻辑方面,如果我理解正确,你想知道两个位置记录之间有30-300秒的间隔,并且它们之间没有里程。因此,你需要比较一行与相邻的一行。为此,使用LAG
(或LEAD
):
SELECT x.*,
(sendtime - last_sentime) * 86400 seconds_elapsed
FROM (SELECT x.*,
LAG(sendtime) OVER (ORDER BY sendtime) last_sendtime,
LAG(mileage) OVER (ORDER BY sendtime) last_mileage
FROM gpsinfo x
WHERE plateno = '京AEW302') x
WHERE (sendtime - last_sentime) * 86400 BETWEEN 30 AND 300
AND mileage = last_mileage
注意:如果你只请求一个plateno
值,就不需要在PARTITION BY
子句中包含plateno,因为只有一个值。你当然可以使用PARTITION BY mileage
作为我在这里展示的mileage = last_mileage
逻辑的替代方法,但里程数可能有许多不同的值,内部根据这么多的值进行分组意味着许多小的(单行)分组,这不太内存/临时存储效率高。但正如我所说,索引确实是你唯一严重的问题。
英文:
This is probably all that matters:
FROM GPSINFO
WHERE plateno = '京AEW302'
Assuming you have lots (thousands, millions maybe) of cars, and are asking for only one license plate number, you want to use an index to access only table rows for that particular car. Your execution plan shows that this isn't happening:
TABLE ACCESS FULL GPSINFO
Simply create an index on the plateno column and that should resolve your issue:
CREATE INDEX gpsinfo_plateno_idx1 on gpsinfo(plateno) compress 1
As for the rest of your query, you'll have to work on the logic to get the right results, but it is unlikely that you have a noticeable performance problem with it as long as you are working with only one car. In terms of the logic, if I understand rightly you want to know when there is a 30-300 second gap between position recordings with no mileage between them. So you need to compare a row with a neighboring row. For that, use LAG
(or LEAD
):
SELECT x.*,
(sendtime - last_sentime) * 86400 seconds_elapsed
FROM (SELECT x.*,
LAG(sendtime) OVER (ORDER BY sendtime) last_sendtime,
LAG(mileage) OVER (ORDER BY sendtime) last_mileage
FROM gpsinfo x
WHERE plateno = '京AEW302') x
WHERE (sendtime - last_sentime) * 86400 BETWEEN 30 AND 300
AND mileage = last_mileage
Note: If you are asking for only one plateno
value, there is no need to include plateno in a PARTITION BY
clause, that's redundant as there is only one. You can certainly use PARTITION BY mileage
as an alternative to my mileage = last_mileage
logic shown here, but mileage is likely to have many different values and internally grouping by that many values means a lot of tiny (single row) groups and that's not very memory/temp efficient. Yet as I said, the indexing is really your only serious issue.
答案2
得分: -2
为了获得尽可能快的结果,您需要预先计算这些值(首次和最后发送时间)。可以通过以下方式实现:
- 负责添加记录的例行程序 - 在一个
plateno
的上下文中,计算并保存首次和最后时间会更快 - 在表上触发器
- 材料化视图和用于刷新其数据的过程
所有这些都需要一些开发时间,不应该很困难。
我还可以建议您首先计算plateno
,然后提取其详细信息。对于sendtime
,我们对第一个(最小)和最后一个(最大)值感兴趣。因此,您可以使用以下查询:
WITH gpsinfo_cte AS (
SELECT plateno
FROM GPSINFO
WHERE plateno = '京AEW302'
GROUP BY plateno
HAVING (MAX(sendtime) - MIN(sendtime)) * 24 * 60 * 60 < 300
AND (MAX(sendtime) - MIN(sendtime)) * 24 * 60 * 60 > 30
)
SELECT *
FROM GPSINFO
WHERE plateno IN (SELECT plateno FROM gpsinfo_cte);
如果这有效,请在plateno和sendtime上添加索引以进一步优化它。
英文:
In order to get fastest possible results you need to pre-calculated these values (first and last send time). This can be achieved with:
- the routine responsible for adding the record - in the context of one
plateno
it will be faster to calculated and save the first and last time - trigger on the table
- materilized view and process for refreshing its data
All required some dev time and can't be difficult.
I can also offer you to try to first calculate the plateno
and then extract its details. In case of sendtime
we are interested in the first (smallest) and last (largest) values. So, you can use:
WITH gpsinfo_cte AS (
SELECT plateno
FROM GPSINFO
WHERE plateno = '京AEW302'
GROUP BY plateno
HAVING (MAX(sendtime) - MIN(sendtime)) * 24 * 60 * 60 < 300
AND (MAX(sendtime) - MIN(sendtime)) * 24 * 60 * 60 > 30
)
SELECT *
FROM GPSINFO
WHERE plateno IN (SELECT plateno FROM gpsinfo_cte);
If this is working, you can add index on plateno and sendtime in order to further optimize it.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论