2023年6月12日 21:48:09go评论109阅读模式

英文:

Simple aggregating query on indexed table with 14M rows taking long

问题

I'm having trouble running a simple aggregating query on a relatively large table (~14M rows). The query in question SELECT stageID, SUM(time) as totalTime FROM <table> GROUP BY stageID results in 4052 rows and takes ~7 seconds. Running mysql from 11.0.2-MariaDB, client 15.2 for Win64 (AMD64) on a local machine with 16GB RAM, an i7 chip. The data is on an SSD drive so I don't understand the amount of time it takes to send the data..

使用 EXPLAIN EXTENDED SELECT ...:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	<table>	index	NULL	uniquekey	8	NULL	13744857	100.00	NULL

使用 EXPLAIN <table>:

Field	Type	Null	Key	Default
stageID	int(11)	NO	PRI	NULL
playerID	int(11)	NO	PRI	NULL
time	int(11)	NO		0

Included indices:
UNIQUE KEY (stageID, playerID) FOREIGN KEY (stageID) FOREIGN KEY (playerID)

ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci

Here's the profile of that query:

Status	Duration
Starting	0.000050
checking permissions	0.000005
Opening tables	0.000013
After opening tables	0.000003
System lock	0.000005
table lock	0.000005
init	0.000017
Optimizing	0.000007
Statistics	0.000021
Preparing	0.000015
Sorting result	0.000007
Executing	0.000002
Sending data	6.893190
End of update loop	0.000010
Query end	0.000003
Commit	0.000004
closing tables	0.000003
Unlocking tables	0.000002
closing tables	0.000007
Starting cleanup	0.000002
Freeing items	0.000003
Updating status	0.000039
Reset for next command	0.000004

Upgrading the resources available to MariaDB by using the my-huge.ini config files etc., no changes in execution time.

英文:

I'm having trouble running a simple aggregating query on a relatively large table (~14M rows). The query in question SELECT stageID, SUM(time) as totalTime FROM <table> GROUP BY stageID
results in 4052 rows and takes ~7 seconds. Running mysql from 11.0.2-MariaDB, client 15.2 for Win64 (AMD64) on a local machine with 16GB RAM, an i7 chip. The data is on an SSD drive so I don't understand the amount of time it takes to send the data..

Using EXPLAIN EXTENDED SELECT ...:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	<table>	index	NULL	uniquekey	8	NULL	13744857	100.00	NULL

Using EXPLAIN <table>:

Field	Type	Null	Key	Default
stageID	int(11)	NO	PRI	NULL
playerID	int(11)	NO	PRI	NULL
time	int(11)	NO		0

Included indices:
UNIQUE KEY (stageID, playerID) FOREIGN KEY (stageID) FOREIGN KEY (playerID)

ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci

Here's the profile of that query:

Status	Duration
Starting	0.000050
checking permissions	0.000005
Opening tables	0.000013
After opening tables	0.000003
System lock	0.000005
table lock	0.000005
init	0.000017
Optimizing	0.000007
Statistics	0.000021
Preparing	0.000015
Sorting result	0.000007
Executing	0.000002
Sending data	6.893190
End of update loop	0.000010
Query end	0.000003
Commit	0.000004
closing tables	0.000003
Unlocking tables	0.000002
closing tables	0.000007
Starting cleanup	0.000002
Freeing items	0.000003
Updating status	0.000039
Reset for next command	0.000004

Upgrading the resources available to MariaDB by using the my-huge.ini config files etc., no changes in execution time.

答案1

得分: 2

以下是已翻译的内容：

"The 'Sending data' line in your profile output proves that almost all of the 7 seconds is spent copying rows of data between the storage engine layer and the SQL layer."

你的资料输出中的“发送数据”行证明了几乎所有的7秒都花在在存储引擎层和SQL层之间复制数据行上。

"The EXPLAIN for your query shows an estimate of examining 13744857 rows, and there's no amount of indexes or resources that can change that."

你的查询的解释显示了估计检查了13744857行，无论索引或资源多少都无法改变这一点。

"The only ways to speed up this query are:

Reduce the examined rows by using an indexed condition in the WHERE clause. But that would be a different query. I assume you need the query to sum all the rows for all stageIds.
Reduce the stored rows in this table by deleting or archiving some.
Use a summary table to record the sums and keep them updated as rows change in your primary table. It may still take 7 seconds to update the summary table, but you don't have to recalculate the sums every time you read them.
Get a faster computer."

加速此查询的唯一方法是：

通过在WHERE子句中使用索引条件来减少检查的行数。但那将是一个不同的查询。我假设你需要查询来汇总所有阶段ID的行。
通过删除或存档一些内容来减少此表中存储的行。
使用摘要表记录总数，并在主要表中的行发生更改时保持其更新。更新摘要表可能仍需要7秒，但您不必每次读取它们时重新计算总数。
获取一台更快的计算机。

英文:

The "Sending data" line in your profile output proves that almost all of the 7 seconds is spent copying rows of data between the storage engine layer and the SQL layer.

The EXPLAIN for your query shows an estimate of examining 13744857 rows, and there's no amount of indexes or resources that can change that.

The only ways to speed up this query are:

Reduce the examined rows by using an indexed condition in the WHERE clause. But that would be a different query. I assume you need the query to sum all the rows for all stageIds.
Reduce the stored rows in this table by deleting or archiving some.
Use a summary table to record the sums and keep them updated as rows change in your primary table. It may still take 7 seconds to update the summary table, but you don't have to recalculate the sums every time you read them.
Get a faster computer.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在包含 1400 万行的索引表上执行简单聚合查询花费时间长

问题

答案1

在MariaDB中使用Variable和不使用之间有何不同？

maxscale使用awk重写过滤器。

调整一个SQL标量函数，该函数执行大量简单操作。

在Docker容器启动时运行SQL脚本。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论