英文:
MS SQL Server slow UPDATE query with Eager Spool despite index within SSMS on local server
问题
I'm running a SQL query within SSMS on the computer that is running Microsoft SQL Server 2019 Developer Edition. The query seems to be significantly slower than it should.
我正在在运行 Microsoft SQL Server 2019 Developer Edition 的计算机上使用 SSMS 运行 SQL 查询。查询似乎比应该的速度慢很多。
I added an index on the columns in question that the query uses, and the duration of the query improved from around 120 seconds to around 85 seconds, but I still think it should be faster, comparing it to other queries between the same tables, with similar column data types, that complete in less than a second.
我在查询中使用的相关列上添加了索引,查询的持续时间从大约120秒改善到大约85秒,但与在相同表之间执行的其他查询相比,这些查询具有类似的列数据类型,执行时间少于一秒,我仍然认为应该更快。
The query updates a summary table by aggregating data from a worktable.
该查询通过从工作表聚合数据来更新摘要表。
The summary table is approx 10,000 rows and 30 columns of mixed VARCHAR, FLOAT and INT columns.
摘要表大约有10,000行,包含混合的VARCHAR、FLOAT和INT列。
The worktable table is approx 7,500,000 rows, with a similar number of columns and a similar mix of column data types.
工作表大约有7,500,000行,具有类似数量的列和类似的列数据类型混合。
Looking at the execution plan, it seems to do an Index Spool (Eager Spool) which, from reading, often can mean that the query is missing an index.
查看执行计划,它似乎执行了索引 Spool(Eager Spool),根据阅读的内容,通常意味着查询缺少索引。
I have a non-clustered index on the three columns the query...queries.
我在查询中涉及的三列上有一个非聚集索引...索引。
The complete query is:
完整的查询如下:
UPDATE SummaryTable
SET LifeTeacherTopScore = (SELECT COUNT(*)
FROM WorkTable AS T1
WHERE T1.Z_date < SummaryTable.Z_date
AND T1.Teacher = SummaryTable.Teacher
AND T1.Score = 1)
WHERE SummaryTable.Teacher <> ''
The execution plan is here:
执行计划在这里:
https://www.brentozar.com/pastetheplan/?id=ByEo9Ru42
On the WorkTable table I have a non-clustered index on Z_date, Teacher and Score.
在 WorkTable 表上,我在 Z_date、Teacher 和 Score 上都有一个非聚集索引。
There are around 10,000 unique entries in the Z_date column, around 25,000 unique entries in the Teacher column, and around 30 unique entries in the Score column.
Z_date 列中有大约10,000个唯一条目,Teacher 列中有大约25,000个唯一条目,Score 列中有大约30个唯一条目。
The teacher column contains around 1,500,000 NULL entries, hence why I exclude NULLs within the above query by finishing it with:
Teacher 列中包含大约1,500,000个NULL条目,因此我在上面的查询中排除了NULL,通过以下方式完成它:
where SummaryTable.Teacher <> ''
Z_date format is DATE, Teacher format is VARCHAR(50), Score format is INT.
Z_date 格式为 DATE,Teacher 格式为 VARCHAR(50),Score 格式为 INT。
After doing an Index Scan (NonClustered), using the index I added, the query then does an Index Spool (Eager Spool), with a cost of 87%, which seems to be the reason for the duration of the scan.
在执行索引扫描(非聚集)并使用我添加的索引之后,查询然后执行索引 Spool(Eager Spool),成本为87%,这似乎是扫描持续时间的原因。
I have read dozens of pages re slow queries, indexes, etc, without finding any page that has helped to isolate the issue.
我已经阅读了数十页关于慢查询、索引等的内容,但没有找到任何可以帮助隔离问题的页面。
Any suggestions, recommendations, on improving the performance of this query greatly appreciated.
非常感谢任何有关改进此查询性能的建议和推荐。
英文:
I'm running a SQL query within SSMS on the computer that is running Microsoft SQL Server 2019 Developer Edition. The query seems to be significantly slower than it should.
I added an index on the columns in question that the query uses, and the duration of the query improved from around 120 seconds to around 85 seconds, but I still think it should be faster, comparing it to other queries between the same tables, with similar column data types, that complete in less than a second.
The query updates a summary table by aggregating data from a worktable.
The summary table is approx 10,000 rows and 30 columns of mixed VARCHAR, FLOAT and INT columns.
The worktable table is approx 7,500,000 rows, with a similar number of columns and a similar mix of column data types.
Looking at the execution plan, it seems to do an Index Spool (Eager Spool) which, from reading, often can mean that the query is missing an index.
I have a non-clustered index on the three columns the query...queries.
The complete query is:
UPDATE SummaryTable
SET LifeTeacherTopScore = (SELECT COUNT(*)
FROM WorkTable AS T1
WHERE T1.Z_date < SummaryTable.Z_date
AND T1.Teacher = SummaryTable.Teacher
AND T1.Score = 1)
WHERE SummaryTable.Teacher <> ''
The execution plan is here:
https://www.brentozar.com/pastetheplan/?id=ByEo9Ru42
On the WorkTable table I have a non-clustered index on Z_date, Teacher and Score.
There are around 10,000 unique entries in the Z_date column, around 25,000 unique entries in the Teacher column, and around 30 unique entries in the Score column.
The teacher column contains around 1,500,000 NULL entries, hence why I exclude NULLs within the above query by finishing it with:
where SummaryTable.Teacher <> ''
Z_date format is DATE, Teacher format is VARCHAR(50), Score format is INT.
After doing an Index Scan (NonClustered), using the index I added, the query then does an Index Spool (Eager Spool), with a cost of 87%, which seems to be the reason for the duration of the scan.
I have read dozens of pages re slow queries, indexes, etc, without finding any page that has helped to isolate the issue.
Any suggestions, recommendations, on improving the performance of this query greatly appreciated.
答案1
得分: 1
索引中列的顺序很重要。尝试使用(Score, Teacher, Z_Date)
。这是关于什么?
SQL Server和其他RDBMS系统使用所谓的B-Tree索引。把它们想象成老式电话簿。要找到某个Z_Date之后的行,特定的Score和Teacher,你访问具有Score为1的电话簿部分。然后查找老师,接着查找第一个合格的日期。然后你可以逐行查看每一行。
在索引术语中,你随机访问B-Tree索引到第一个合格的行,然后顺序扫描行。这仅在索引中Z_Date列位于其他两列之后时有效。如果Z_Date首先出现在列中,你将不得不搜索所有匹配的行,这是昂贵的。
专业提示: 在SSMS中:右键单击查询窗口,然后选择显示实际执行计划,然后运行查询。执行计划显示有时会建议创建新索引。
英文:
The order of columns in an index matters. Try (Score, Teacher, Z_Date)
. What's this about?
SQL Server and other RDBMS systems use so-called B-Tree indexes. Think of them as old-school telephone books. To find the rows after a certain Z_Date for a particular Score and Teacher, you visit the section of the book with Scores of 1. You then look for the teacher, then you look for the first eligible date. You can then look at each row in turn.
In indexing terms, you random-access the B-Tree index to the first eligible row, then scan the rows sequentially. This only works if, in the index, the Z_Date column comes after the other two columns. If Z_Date were first in the column you'd have to pick through all the rows looking for matching ones, and that is expensive.
Pro tip: In SSMS: Right-click in a query window, then select Show Actual Execution Plan, then run the query. The execution plan display sometimes recommends a new index to create.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论