在包含 1400 万行的索引表上执行简单聚合查询花费时间长

huangapple go评论53阅读模式
英文:

Simple aggregating query on indexed table with 14M rows taking long

问题

I'm having trouble running a simple aggregating query on a relatively large table (~14M rows). The query in question SELECT stageID, SUM(time) as totalTime FROM <table> GROUP BY stageID results in 4052 rows and takes ~7 seconds. Running mysql from 11.0.2-MariaDB, client 15.2 for Win64 (AMD64) on a local machine with 16GB RAM, an i7 chip. The data is on an SSD drive so I don't understand the amount of time it takes to send the data..

使用 EXPLAIN EXTENDED SELECT ...:

id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE <table> index NULL uniquekey 8 NULL 13744857 100.00 NULL

使用 EXPLAIN <table>:

Field Type Null Key Default Extra
stageID int(11) NO PRI NULL
playerID int(11) NO PRI NULL
time int(11) NO 0

Included indices:
UNIQUE KEY (stageID, playerID) FOREIGN KEY (stageID) FOREIGN KEY (playerID)

ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci

Here's the profile of that query:

Status Duration
Starting 0.000050
checking permissions 0.000005
Opening tables 0.000013
After opening tables 0.000003
System lock 0.000005
table lock 0.000005
init 0.000017
Optimizing 0.000007
Statistics 0.000021
Preparing 0.000015
Sorting result 0.000007
Executing 0.000002
Sending data 6.893190
End of update loop 0.000010
Query end 0.000003
Commit 0.000004
closing tables 0.000003
Unlocking tables 0.000002
closing tables 0.000007
Starting cleanup 0.000002
Freeing items 0.000003
Updating status 0.000039
Reset for next command 0.000004

Upgrading the resources available to MariaDB by using the my-huge.ini config files etc., no changes in execution time.

英文:

I'm having trouble running a simple aggregating query on a relatively large table (~14M rows). The query in question SELECT stageID, SUM(time) as totalTime FROM <table> GROUP BY stageID
results in 4052 rows and takes ~7 seconds. Running mysql from 11.0.2-MariaDB, client 15.2 for Win64 (AMD64) on a local machine with 16GB RAM, an i7 chip. The data is on an SSD drive so I don't understand the amount of time it takes to send the data..

Using EXPLAIN EXTENDED SELECT ...:

id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE <table> index NULL uniquekey 8 NULL 13744857 100.00 NULL

Using EXPLAIN <table>:

Field Type Null Key Default Extra
stageID int(11) NO PRI NULL
playerID int(11) NO PRI NULL
time int(11) NO 0

Included indices:
UNIQUE KEY (stageID, playerID)
FOREIGN KEY (stageID)
FOREIGN KEY (playerID)

ENGINE=InnoDB
DEFAULT CHARSET=utf8mb3
COLLATE=utf8mb3_general_ci

Here's the profile of that query:

Status Duration
Starting 0.000050
checking permissions 0.000005
Opening tables 0.000013
After opening tables 0.000003
System lock 0.000005
table lock 0.000005
init 0.000017
Optimizing 0.000007
Statistics 0.000021
Preparing 0.000015
Sorting result 0.000007
Executing 0.000002
Sending data 6.893190
End of update loop 0.000010
Query end 0.000003
Commit 0.000004
closing tables 0.000003
Unlocking tables 0.000002
closing tables 0.000007
Starting cleanup 0.000002
Freeing items 0.000003
Updating status 0.000039
Reset for next command 0.000004

Upgrading the resources available to MariaDB by using the my-huge.ini config files etc., no changes in execution time.

答案1

得分: 2

以下是已翻译的内容:

"The 'Sending data' line in your profile output proves that almost all of the 7 seconds is spent copying rows of data between the storage engine layer and the SQL layer."

你的资料输出中的“发送数据”行证明了几乎所有的7秒都花在在存储引擎层和SQL层之间复制数据行上。

"The EXPLAIN for your query shows an estimate of examining 13744857 rows, and there's no amount of indexes or resources that can change that."

你的查询的解释显示了估计检查了13744857行,无论索引或资源多少都无法改变这一点。

"The only ways to speed up this query are:

  • Reduce the examined rows by using an indexed condition in the WHERE clause. But that would be a different query. I assume you need the query to sum all the rows for all stageIds.

  • Reduce the stored rows in this table by deleting or archiving some.

  • Use a summary table to record the sums and keep them updated as rows change in your primary table. It may still take 7 seconds to update the summary table, but you don't have to recalculate the sums every time you read them.

  • Get a faster computer."

加速此查询的唯一方法是:

  • 通过在WHERE子句中使用索引条件来减少检查的行数。但那将是一个不同的查询。我假设你需要查询来汇总所有阶段ID的行。

  • 通过删除或存档一些内容来减少此表中存储的行。

  • 使用摘要表记录总数,并在主要表中的行发生更改时保持其更新。更新摘要表可能仍需要7秒,但您不必每次读取它们时重新计算总数。

  • 获取一台更快的计算机。

英文:

The "Sending data" line in your profile output proves that almost all of the 7 seconds is spent copying rows of data between the storage engine layer and the SQL layer.

The EXPLAIN for your query shows an estimate of examining 13744857 rows, and there's no amount of indexes or resources that can change that.

The only ways to speed up this query are:

  • Reduce the examined rows by using an indexed condition in the WHERE clause. But that would be a different query. I assume you need the query to sum all the rows for all stageIds.

  • Reduce the stored rows in this table by deleting or archiving some.

  • Use a summary table to record the sums and keep them updated as rows change in your primary table. It may still take 7 seconds to update the summary table, but you don't have to recalculate the sums every time you read them.

  • Get a faster computer.

huangapple
  • 本文由 发表于 2023年6月12日 21:48:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/76457317.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定