2023年6月19日 19:33:52go评论67阅读模式

英文:

Enhance Sql Server Procedure Performance

问题

我有以下的过程，当我对大数据执行长时间的操作时，性能有点慢。
我有一张表，用于存储汽车停车记录。
这张表包含数百万条记录。

我希望一切都清楚，我可以找到一些帮助来提高我的查询性能。

CREATE PROCEDURE [dbo].[Report_Parking]
@PageNumber INT = 1,
@PageSize INT = 50,
@StartTime INT = 0, --我使用的是Unix时间戳
@EndTime INT = 0,
@ColumnSort NVARCHAR(50),
@OrderSort NVARCHAR(50),
@CarIDs NVARCHAR(max)--汽车数组
AS
SET NOCOUNT ON
SET FMTONLY OFF
BEGIN

CREATE TABLE #parkingTmp
(
ID INT IDENTITY(1,1) NOT NULL,
rownum BIGINT,
CarID NVARCHAR(23) NOT NULL,
Start_Time DateTime,
End_Time DateTime,
Parking_Time NVARCHAR(50),
Address NVARCHAR(max),
Long NVARCHAR(20),
Lat NVARCHAR(20),
Plate_Number NVARCHAR(max),
PageCounts BIGINT
)
DECLARE @query NVARCHAR(MAX) = '';

SET @query += N'INSERT INTO #parkingTmp 
SELECT * FROM [dbo].CarsParking 
WHERE ((EndTimeParking BETWEEN '+CAST(@StartTime AS VARCHAR)+' AND '+CAST(@EndTime AS VARCHAR)+') AND [CarID] IN ('''+@CarIDs+'''))'
EXEC(@query)

-- 为未找到的汽车插入默认记录
DECLARE @Id NVARCHAR(23)
DECLARE @Ids TABLE (id NVARCHAR(23) PRIMARY KEY NOT NULL)
INSERT @Ids(id) SELECT * FROM [dbo].[SplitStrings_ToList](@CarIDs, ',')
DELETE FROM @Ids
WHERE Id IN (SELECT DISTINCT CarID FROM #parkingTmp)

WHILE EXISTS (SELECT * FROM @Ids)
BEGIN
SELECT @Id = MAX(id) FROM @Ids

INSERT INTO #parkingTmp 
SELECT TOP 1 *
FROM CarsParking T
WHERE CarID = @Id
ORDER BY Start_Time DESC

DELETE FROM @Ids WHERE id = @Id
END
--- 插入结束

-- 选择带分页的最终结果
DECLARE @FirstRec INT = (@PageNumber - 1) * @PageSize;
DECLARE @LastRec INT = @PageNumber * @PageSize + 1;

DECLARE @sql NVARCHAR(MAX) = ''
SET @sql = N'WITH ctepaging AS (SELECT ROW_NUMBER() OVER(ORDER BY Plate_Number,'+@ColumnSort+' '+@OrderSort+') AS rownumber,* FROM #parkingTmp)'
SET @sql += ' SELECT * FROM ctepaging '
SET @sql += ' WHERE rownumber > '+ CONVERT(NVARCHAR(12), @FirstRec) + ' AND rownumber < ' + CONVERT(NVARCHAR(12), @LastRec)

EXEC sp_executesql @sql
END

我在ID列上添加了索引，这个列默认是主键，并且在筛选条件中的CarID和ParkingTime列上创建了非聚集索引。
这些改进了存储过程的性能，现在对于10万条记录只需要1分钟，而之前需要超过5分钟。

这里是脚本生成的数据示例。

英文:

I have the following procedure and it's performance is kinda slow when I execute it for long range of time for big data
I have table and storing records of Cars parking
the table contains millions of records

I hope everything is clear and I can find some help to improve performance of my query

CREATE PROCEDURE [dbo].[Report_Parking]
@PageNumber        INT = 1, 
@PageSize          INT = 50, 
@StartTime         int = 0 , --I&#39;m using unix timestamp
@EndTime           int = 0, 
@ColumnSort       NVARCHAR(50), 
@OrderSort         NVARCHAR(50),
@CarIDs          NVARCHAR(max)--array of cars
AS 
SET nocount ON 
SET fmtonly OFF 
BEGIN 
CREATE TABLE #parkingTmp 
(
ID	          int IDENTITY(1,1) NOT NULL,
rownum            bigInt,
CarID             NVARCHAR(23) NOT NULL,
Start_Time        DateTime, 
End_Time          DateTime,
Parking_Time      NVARCHAR(50),
Address		  NVARCHAR(max),
Long		  NVARCHAR(20),
Lat		  NVARCHAR(20),
Plate_Number	  NVARCHAR(max),
PageCounts	  bigInt
)
DECLARE @query NVARCHAR(MAX)=&#39;&#39;;
SET @query+= N&#39;insert into #parkingTmp 
SELECT   * FROM [dbo].CarsParking 
WHERE	((EndTimeParking BETWEEN &#39;+cast(@StartTime as varchar)+&#39; AND &#39;+cast(@EndTime as varchar)+&#39;) and [CarID] in(&#39;&#39;&#39;+@CarIDs +&#39;&#39;&#39;)&#39;
exec(@query)
-- insert default records for not found cars 
Declare @Id nvarchar(23)
Declare @Ids Table (id nvarchar(23) primary Key not null)
Insert @Ids(id) select * from [dbo].[SplitStrings_ToList](@CarIDs,&#39;,&#39;) 
delete from @ids  
WHERE Id IN (SELECT DISTINCT CarID FROM #parkingTmp)
While exists (Select * From @Ids)
Begin
Select @Id = MAX(id) from @Ids
INSERT INTO #parkingTmp 
SELECT  top 1 *
FROM CarsParking T
WHERE CarID = @Id
order by Start_Time DESC 
Delete from @Ids Where id = @Id
End
--- end inserting 
-- select final result with paging
DECLARE @FirstRec INT= (@PageNumber -1) * @PageSize; 
DECLARE @LastRec INT= @PageNumber * @PageSize +1;
declare @sql nvarchar(Max)=&#39;&#39;	 
SET @sql = N&#39;WITH ctepaging AS (SELECT ROW_NUMBER() OVER(ORDER BY Plate_Number,&#39;+@ColumnSort+&#39; &#39;+@OrderSort+&#39;) AS rownumber,* FROM #parkingTmp)&#39;
SET @sql += &#39; SELECT * FROM ctepaging &#39;
SET @sql += &#39; WHERE  rownumber &gt; &#39;+ CONVERT(NVARCHAR(12), @FirstRec) + &#39; AND rownumber &lt; &#39; + CONVERT(NVARCHAR(12), @LastRec)
exec sp_executesql @sql 
END

I added index on ID which is primary key by default and
also non clustered index in these columns which I filter on where statement CarID,ParkingTime
it improves the performance to procedure so it's taking 1 minute for 100k records now
before it was taking more than 5 mins
taking 1 minute for 100k records
but still the performance is not that good

here's a sample of data generated by script

答案1

得分: 1

我可能会写成这样：

CREATE OR ALTER PROCEDURE [dbo].[Report_Parking]
@PageNumber INT = 1,
@PageSize INT = 50,
@StartTime int = 0, --我使用的是Unix时间戳
@EndTime int = 0,
@ColumnSort NVARCHAR(50),
@OrderSort NVARCHAR(50),
@CarIDs NVARCHAR(max)--车辆的数组

AS
SET NOCOUNT ON
SET FMTONLY OFF
BEGIN

DECLARE @query NVARCHAR(MAX) = ''
, @FirstRec INT = (@PageNumber - 1) * @PageSize
, @LastRec INT = @PageNumber * @PageSize + 1;

CREATE TABLE #IDs (id NVARCHAR(23) PRIMARY KEY)

INSERT INTO #IDs (
id
)
SELECT DISTINCT *
FROM [dbo].[SplitStrings_ToList](@CarIDs, ',')

SET @query = N'
SELECT *
FROM 
(
SELECT *, row_number() OVER(ORDER BY Plate_Number,' + @ColumnSort + ' ' + @OrderSort + ') AS rownumber2
FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY CarID ORDER BY Start_Time DESC) AS rownumber1
, cp.* 
FROM [dbo].CarsParking cp
INNER JOIN #IDs ids
ON ids.ID = cp.CarID
WHERE EndTimeParking BETWEEN @StartTime AND @EndTime 
) x
WHERE x.rownumber1 = 1 -- 取最后一辆车的记录
) y
WHERE rownumber2 > @FirstRec -- 排序
AND rownumber2 < @LastRec ';

EXEC sp_executesql @query 
, N'@starttime INT, @endtime INT, @FirstRec INT, @LastRec INT'
, @StartTime = @StartTime, @EndTime = @EndTime, @FirstRec = @FirstRec, @LastRec = @LastRec
END
GO

-- 测试代码
EXEC report_parking 1, 30, 111, 222, 'test,', 'test2', '1,2,3,4'

我喜欢将ID放入具有索引的临时表中，以加快连接速度。

对于动态查询，我将变量作为参数传递，而不是像你一样构建整个SQL字符串。在某些情况下，你的版本可能更快，但通常最好参数化事务。

最后，为了处理排序和分页，我使用了两个row_numbers，第一个用于删除车辆的重复记录，第二个用于正确按页排序。这部分可能会有点慢，但希望@CarIDs列表中的行数较少，所以这不会是一个巨大的问题。

这个查询可能没有经过充分测试，但你应该能够理解基本思路。

英文:

I would probably write something like this:

CREATE OR ALTER PROCEDURE [dbo].[Report_Parking]
@PageNumber        INT = 1, 
@PageSize          INT = 50, 
@StartTime         int = 0 , --I&#39;m using unix timestamp
@EndTime           int = 0, 
@ColumnSort       NVARCHAR(50), 
@OrderSort         NVARCHAR(50),
@CarIDs          NVARCHAR(max)--array of cars

AS 
SET nocount ON 
SET fmtonly OFF 
BEGIN 
   
	DECLARE @query NVARCHAR(MAX)=&#39;&#39;
	,	@FirstRec INT= (@PageNumber -1) * @PageSize
	,	@LastRec INT= @PageNumber * @PageSize +1;
    
	create table #IDs (id nvarchar(23) primary key)
	
	insert into #IDs (
		id
	)
	SELECT	DISTINCT * 
	from	[dbo].[SplitStrings_ToList](@CarIDs,&#39;,&#39;) 

	SET @query = N&#39;
		select *
		from 
(
			SELECT *, row_number() over(order by  Plate_Number,&#39;+@ColumnSort+&#39; &#39;+@OrderSort+&#39;) as rownumber2
			FROM (
					SELECT	ROW_NUMBER() OVER(PARTITION BY CarID ORDER BY Start_Time DESC) AS rownumber1
					,	cp.* 
					FROM	[dbo].CarsParking cp
					INNER JOIN #IDs ids
						ON	ids.ID = cp.CarID
					WHERE   EndTimeParking BETWEEN @StartTime  AND @EndTime 
				) x
			WHERE x.rownumber1 = 1 -- Take last car post
			) y
		WHERE  rownumber2 &gt; @FirstRec -- Sort
		AND	rownumber2 &lt; @LastRec &#39;
    
    exec sp_executesql @query 
    ,	N&#39;@starttime INT, @endtime INT, @FirstRec INT, @LastRec INT&#39;
    , @StartTime = @StartTime, @EndTime = @EndTime, @FirstRec = @FirstRec, @LastRec = @LastRec
END
GO

-- test code
exec report_parking 1, 30,111, 222, &#39;test,&#39;, &#39;test2&#39;, &#39;1,2,3,4&#39;

I like to put the IDs into temp table with index so the join is fast.

For the dynamic query, i pass the variables as parameters instead of building whole SQL string like you do. In some cases your version is faster, but it's usually better to parametrise things.

Finally, to handle sort and paging, i do two row_numbers, first to remove duplicates from cars, and second to properly sort them by pages. This part might be slow but hopefully you have few rows in the @CarIDs list so this won't be a huge problem

The query is a bit untested but you should be able to get the general idea

答案2

得分: -2

避免使用动态SQL：应避免使用动态SQL，因为它有几个缺点，如可能发生SQL注入和性能问题。相反，使用参数化查询，它们不仅更安全，而且通常更快，因为SQL Server能够重复使用执行计划。
早期筛选：与将所有记录插入临时表然后进行筛选相比，尽量在尽早的阶段进行数据筛选，以减少处理的数据量。在第一个查询中使用WHERE子句，仅筛选所需的记录。
使用适当的索引：索引是优化查询性能的关键。在经常在WHERE子句、JOIN和ORDER BY子句中访问的列上创建索引。如果已经有索引，请确保它们被高效地使用。使用SQL Server的查询执行计划可以帮助分析哪些索引正在使用。
使用适当的数据类型：我注意到您在一些列中，如Address和Plate_Number，使用了NVARCHAR(MAX)。除非确实必要，通常最好避免使用MAX作为长度。相反，尝试为这些列中存储的数据使用适当的数据长度。
分批和分页：您已经在使用分页，这是不错的，但仍然可以优化。尽量避免使用SELECT *，而是只选择您需要的列。
优化循环：在插入默认记录的while循环可能会很慢。如果可能的话，考虑使用基于集合的方法，而不是循环。例如，您可以一次性插入所有记录，而不是在循环中插入。
避免在大数据集中使用表变量：与临时表相比，表变量在处理大数据集时可能效率较低。如果处理大量数据，可以考虑只使用临时表。
使用查询提示：有时，提供查询提示可以提高性能。例如，您可能会发现在您的情况下使用(NOLOCK)提示是有意义的。然而，要谨慎使用这些提示，只有在确定它们在您的情况下是必要且合适的情况下使用。
避免在WHERE子句中使用函数：如果[dbo].[SplitStrings_ToList]是一个函数，在WHERE子句中使用它可能会减慢查询速度。尽量避免在WHERE子句中使用函数。
查看服务器设置和资源：确保您的SQL Server拥有足够的资源（CPU、内存）并查看SQL Server的设置。有时，性能问题不仅仅是因为查询，还因为服务器资源和配置。

请记住，性能调优有时可能需要反复试验的过程。始终彻底测试您的更改，以确保它们提高性能而不影响功能或数据完整性。

英文:

Avoid Using Dynamic SQL: Dynamic SQL should be avoided, as it has several disadvantages, such as the possibility of SQL injection and performance issues. Instead, use parameterized queries which are not only safer but are often faster due to SQL Server being able to reuse the execution plans.
Filter Early: Instead of inserting all records into a temporary table and then filtering, try to filter the data as early as possible to reduce the amount of data being handled. Use the WHERE clause in the first query to filter out only the records needed.
Use Proper Indexes: Indexes are key to optimizing query performance. Create indexes on the columns that are frequently accessed in your WHERE clause, JOIN, and ORDER BY clauses. If you already have indexes, make sure they are being used efficiently. Using SQL Server's Query Execution Plan can be helpful in analyzing which indexes are being used.
Use Proper Data Types: I noticed that you are using NVARCHAR(MAX) for some columns like Address and Plate_Number. It's usually best to avoid using MAX as the length unless it is really necessary. Instead, try to use an appropriate data length for the data stored in these columns.
Batching and Pagination: You are already using paging which is good, but it can still be optimized. Try to avoid using SELECT * and instead select only the columns you need.
Optimize Loop: The while loop where you're inserting default records can be slow. Consider using a set-based approach instead of a loop if possible. For example, you might insert all the records at once instead of in a loop.
Avoid Table Variables for Large Data Sets: Table variables can be less efficient for large data sets compared to temp tables. You might consider just using temp tables if you are dealing with a lot of data.
Use Query Hints: Sometimes, providing query hints can boost performance. For instance, you might find that using the (NOLOCK) hint makes sense in your case. However, use these cautiously and only if you are sure they are needed and appropriate for your case.
Avoid Using Functions in WHERE Clause: If [dbo].[SplitStrings_ToList] is a function, using it in the WHERE clause could slow down your query. Try to avoid using functions in the WHERE clause.
Review Server Settings and Resources: Ensure your SQL Server has enough resources (CPU, memory) and review SQL Server settings. Sometimes, performance issues are not just because of the query but because of the server resources and configuration.

Remember that tuning performance can sometimes be a process of trial and error. Always thoroughly test your changes to ensure they are improving performance and not affecting functionality or data integrity.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

优化 SQL Server 存储过程性能。

问题

答案1

答案2

PSQL / SQL：是否可以进一步优化此查询，而不需要对数据库进行写访问？

Find SCD-2 timerange overlaps when a sensor belongs to multiple labels simultaneously in SQL (Postgres)

空列在Databricks的S3选择中未列出

如何在MySQL中将一个列拆分为Hive表中的两个列？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论