英文:
Why would an Azure SQL query use 29 seconds of CPU sometimes, 0.1 ms other times on an idle 10 core pool?
问题
在 Azure SQL 弹性池完全空闲(周末期间)时,查询存储(query store)显示一个存储过程/查询在使用 29 秒的 CPU 时间,导致平均 CPU 使用率高达 80% 的警报触发。
我在随后的周一早上,当弹性池也处于空闲状态时,重新创建了相同的情况,并使用相同的参数值执行了该存储过程/查询,并观察了 sys.dm_exec_query_stats 中的结果。
CPU 大约为 0.1 毫秒。查询存储显示它使用相同的 QueryId、相同的 PlanId,并且负载均匀分布在整个计划中。此外,计划没有并行执行。
因为是在周末,表中的数据量没有以可测量的方式发生变化。我还检查了存储过程返回的数据在此事件发生之前几个月就存在,所以这不是结果在两次运行之间发生变化的情况。
我更像是开发人员而不是数据库管理员,所以有人能告诉我在哪里查找为什么以及如何在明显相同的环境下查询的性能会如此不同吗?这是否与 Azure SQL 的无服务器环境有关的现象?
在评论中提出了更多统计信息:
- 两次查询运行的编译时间分别为 77 毫秒和 47 毫秒的 CPU 时间。
- 计划都从缓存中检索到。
- 短时间运行的等待时间似乎不可用。
- 长时间运行的等待时间为 "CPU",为 6303 毫秒。
非常感谢您的帮助!
英文:
Using an Azure SQL elastic pool with 10 cores when it is completely idle (over a weekend), query store shows an SP/query using 29 seconds of CPU and causing a high average CPU usage alert to trigger at 80%.
I recreated the situation when the pool was also idle the following Monday early morning, and fired the SP/Query with the same parameter values and watched for the results in sys.dm_exec_query_stats.
CPU was about 0.1 ms. Query store showed it used the same QueryId, same PlanId, same distribution of load evenly spread throughout the plan. Also, the plan did not go parallel.
Because it was over a weekend, the volume of data in the tables had not changed in any measurable way. I also checked that the data being returned by the SP existed months before this incident so it is not a case of the results changing in between times.
I'm much more of a developer than a DBA, so can anyone me where else to look to find out why, and how, a query could perform so drastically different under the apparent same environment? Is this a phenomenon that has something to do with the serverless environment of Azure SQL specifically?
Some more stats requested in comments:
- compile time for both query runs was 77ms and 47ms CPU.
- plan was retrieved from cache in both instances.
- wait time doesn't seem to be available for the short run time.
- wait time was "CPU" 6303ms for the long run time.
Many thanks for your help!
答案1
得分: 2
这是一个标准的SQL Server模式。第一次运行查询时,需要进行大量物理IO,内存分配增加,速度较慢。您可以查看执行缓慢的查询,它们可能会显示PAGEIOLATCH_SH和MEMORY_ALLOCATION_EXT等待,这对应于从磁盘拉取页面到缓冲区。第二次运行查询时,数据在缓冲区中,速度较快。
一段时间不活动后,内存分配会减少。Azure SQL数据库在数据库未使用一段时间或数据库层级已上下调整后会缩小内存分配。您将在Azure SQL数据库上看到这种情况,但在SQL Server实例(IaaS)上不会。如果将数据库设置为Serverless,您将看到内存更频繁地被回收,您可以在Azure SQL Serverless文档中了解更多信息这里,这对性能有更大的影响。
另一个可能的原因是查询在编译和执行恢复之前等待同步统计信息更新完成。请尝试启用异步更新统计信息,如此处所述。定期更新统计信息并删除索引碎片也可以改善性能,因此无需在执行查询之前更新统计信息,因为它们会定期由维护作业更新。
英文:
This is a standard SQL Server pattern. The first time you run a query it has to do a lot of physical IO, memory allocation raises and it's slow. You may take a look at queries performing slow and they may be showing the PAGEIOLATCH_SH and MEMORY_ALLOCATION_EXT waits and that corresponds to pages being pulled from disk to the buffer. The second time you run the query the data is in buffers and it's fast.
After a period of inactivity memory allocation drops. Azure SQL Database shrinks memory allocation after the database has not been used for some time or the database tier has been scaled up or down. You will see this happening on Azure SQL Database but not on SQL Server instances (IaaS). If you set a database as Serverless you will see memory is reclaimed even more frequently as you can read in the Azure SQL Serverless documentation here, and that has a greater impact in performance.
Another possible reason is queries are waiting for a synchronous statistics update to complete before the compilation and then execution can resume. Please try to enable Async update statistics as explained here. Regularly updating statistics removing index fragmentation can improve the performance also, so no need to update stats prior to execute queries because they are updated by a maintenance job regularly.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论