2023年5月10日 17:22:53go评论82阅读模式

英文:

MS SQL Server slow UPDATE query with Eager Spool despite index within SSMS on local server

问题

I'm running a SQL query within SSMS on the computer that is running Microsoft SQL Server 2019 Developer Edition. The query seems to be significantly slower than it should.

我正在在运行 Microsoft SQL Server 2019 Developer Edition 的计算机上使用 SSMS 运行 SQL 查询。查询似乎比应该的速度慢很多。

I added an index on the columns in question that the query uses, and the duration of the query improved from around 120 seconds to around 85 seconds, but I still think it should be faster, comparing it to other queries between the same tables, with similar column data types, that complete in less than a second.

我在查询中使用的相关列上添加了索引，查询的持续时间从大约120秒改善到大约85秒，但与在相同表之间执行的其他查询相比，这些查询具有类似的列数据类型，执行时间少于一秒，我仍然认为应该更快。

The query updates a summary table by aggregating data from a worktable.

该查询通过从工作表聚合数据来更新摘要表。

The summary table is approx 10,000 rows and 30 columns of mixed VARCHAR, FLOAT and INT columns.

摘要表大约有10,000行，包含混合的VARCHAR、FLOAT和INT列。

The worktable table is approx 7,500,000 rows, with a similar number of columns and a similar mix of column data types.

工作表大约有7,500,000行，具有类似数量的列和类似的列数据类型混合。

Looking at the execution plan, it seems to do an Index Spool (Eager Spool) which, from reading, often can mean that the query is missing an index.

查看执行计划，它似乎执行了索引 Spool（Eager Spool），根据阅读的内容，通常意味着查询缺少索引。

I have a non-clustered index on the three columns the query...queries.

我在查询中涉及的三列上有一个非聚集索引...索引。

The complete query is:

完整的查询如下：

UPDATE SummaryTable
SET LifeTeacherTopScore = (SELECT COUNT(*)
FROM WorkTable AS T1
WHERE T1.Z_date < SummaryTable.Z_date
AND T1.Teacher = SummaryTable.Teacher
AND T1.Score = 1)
WHERE SummaryTable.Teacher <> ''

The execution plan is here:

执行计划在这里：

https://www.brentozar.com/pastetheplan/?id=ByEo9Ru42

On the WorkTable table I have a non-clustered index on Z_date, Teacher and Score.

在 WorkTable 表上，我在 Z_date、Teacher 和 Score 上都有一个非聚集索引。

There are around 10,000 unique entries in the Z_date column, around 25,000 unique entries in the Teacher column, and around 30 unique entries in the Score column.

Z_date 列中有大约10,000个唯一条目，Teacher 列中有大约25,000个唯一条目，Score 列中有大约30个唯一条目。

The teacher column contains around 1,500,000 NULL entries, hence why I exclude NULLs within the above query by finishing it with:

Teacher 列中包含大约1,500,000个NULL条目，因此我在上面的查询中排除了NULL，通过以下方式完成它：

where SummaryTable.Teacher <> ''

Z_date format is DATE, Teacher format is VARCHAR(50), Score format is INT.

Z_date 格式为 DATE，Teacher 格式为 VARCHAR(50)，Score 格式为 INT。

After doing an Index Scan (NonClustered), using the index I added, the query then does an Index Spool (Eager Spool), with a cost of 87%, which seems to be the reason for the duration of the scan.

在执行索引扫描（非聚集）并使用我添加的索引之后，查询然后执行索引 Spool（Eager Spool），成本为87%，这似乎是扫描持续时间的原因。

I have read dozens of pages re slow queries, indexes, etc, without finding any page that has helped to isolate the issue.

我已经阅读了数十页关于慢查询、索引等的内容，但没有找到任何可以帮助隔离问题的页面。

Any suggestions, recommendations, on improving the performance of this query greatly appreciated.

非常感谢任何有关改进此查询性能的建议和推荐。

英文:

I'm running a SQL query within SSMS on the computer that is running Microsoft SQL Server 2019 Developer Edition. The query seems to be significantly slower than it should.

The query updates a summary table by aggregating data from a worktable.

The summary table is approx 10,000 rows and 30 columns of mixed VARCHAR, FLOAT and INT columns.

The worktable table is approx 7,500,000 rows, with a similar number of columns and a similar mix of column data types.

Looking at the execution plan, it seems to do an Index Spool (Eager Spool) which, from reading, often can mean that the query is missing an index.

I have a non-clustered index on the three columns the query...queries.

The complete query is:

UPDATE SummaryTable 
SET LifeTeacherTopScore = (SELECT COUNT(*) 
                           FROM WorkTable AS T1 
                           WHERE T1.Z_date &lt; SummaryTable.Z_date 
                             AND T1.Teacher = SummaryTable.Teacher 
                             AND T1.Score = 1)
WHERE SummaryTable.Teacher &lt;&gt; &#39;&#39;

The execution plan is here:

https://www.brentozar.com/pastetheplan/?id=ByEo9Ru42

On the WorkTable table I have a non-clustered index on Z_date, Teacher and Score.

There are around 10,000 unique entries in the Z_date column, around 25,000 unique entries in the Teacher column, and around 30 unique entries in the Score column.
The teacher column contains around 1,500,000 NULL entries, hence why I exclude NULLs within the above query by finishing it with:

where SummaryTable.Teacher &lt;&gt; &#39;&#39;

Z_date format is DATE, Teacher format is VARCHAR(50), Score format is INT.

After doing an Index Scan (NonClustered), using the index I added, the query then does an Index Spool (Eager Spool), with a cost of 87%, which seems to be the reason for the duration of the scan.

I have read dozens of pages re slow queries, indexes, etc, without finding any page that has helped to isolate the issue.

Any suggestions, recommendations, on improving the performance of this query greatly appreciated.

答案1

得分: 1

索引中列的顺序很重要。尝试使用(Score, Teacher, Z_Date)。这是关于什么？

SQL Server和其他RDBMS系统使用所谓的B-Tree索引。把它们想象成老式电话簿。要找到某个Z_Date之后的行，特定的Score和Teacher，你访问具有Score为1的电话簿部分。然后查找老师，接着查找第一个合格的日期。然后你可以逐行查看每一行。

在索引术语中，你随机访问B-Tree索引到第一个合格的行，然后顺序扫描行。这仅在索引中Z_Date列位于其他两列之后时有效。如果Z_Date首先出现在列中，你将不得不搜索所有匹配的行，这是昂贵的。

专业提示: 在SSMS中：右键单击查询窗口，然后选择显示实际执行计划，然后运行查询。执行计划显示有时会建议创建新索引。

英文:

The order of columns in an index matters. Try (Score, Teacher, Z_Date). What's this about?

SQL Server and other RDBMS systems use so-called B-Tree indexes. Think of them as old-school telephone books. To find the rows after a certain Z_Date for a particular Score and Teacher, you visit the section of the book with Scores of 1. You then look for the teacher, then you look for the first eligible date. You can then look at each row in turn.

In indexing terms, you random-access the B-Tree index to the first eligible row, then scan the rows sequentially. This only works if, in the index, the Z_Date column comes after the other two columns. If Z_Date were first in the column you'd have to pick through all the rows looking for matching ones, and that is expensive.

Pro tip: In SSMS: Right-click in a query window, then select Show Actual Execution Plan, then run the query. The execution plan display sometimes recommends a new index to create.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

MS SQL Server slow UPDATE query with Eager Spool despite index within SSMS on local server

问题

答案1

Server connection problem in ASP.NET Core MVC

只匹配两个表之间第一个 “not used” 值的出现。

计算从多个日期区间开始的最后一年内过去的天数，直到达到一个限制。

Pandas同时转换多列数据类型

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。