英文:
Consecutive MySQL queries from Go become MUCH slower after some point
问题
我正在使用Go语言编写一个作业,该作业通过一些MySQL表格进行遍历,根据一些条件选择其中的一些行,从中提取电子邮件地址,并向每个地址发送电子邮件。
过滤过程查看一个相当大的表格(我们称之为storage
),大小约为6GB,结构如下:
列:
id varchar(64) PK
path varchar(64) PK
game varchar(64)
guid varchar(64)
value varchar(512)
timestamp timestamp
有两个索引:(id, path)
(如上所示的主键)和guid
。
该作业首先从一个表格中检索一长串guid,然后对storage
表格执行连续的查询,如下所示:
<!-- language: lang-sql -->
SELECT guid, timestamp FROM storage
WHERE game = 'somegame'
AND path = 'path' AND value = 'value' AND timestamp >= '2015-04-22 00:00:00.0' AND timestamp <= '2015-04-29T14:53:07+02:00'
AND guid IN ( ... )
其中IN
子句包含一系列的guid。
我需要检索时间戳以进行进一步的筛选。
在我的本地MySQL上运行时,一切都按预期工作,查询大约需要180毫秒,每批处理1000个guid。
在Amazon RDS上运行相同的数据库时,查询开始很快,但在某个点之后,突然开始花费约30秒的时间,并持续到作业结束。
我尝试了很多方法来解决这个问题,但无法找出原因。一些注意事项:
- 该作业只使用一个
sql.DB
对象。我只准备上述语句一次,并且在很大程度上重复使用它。 - 起初,我认为是因为RDS数据库运行的是MySQL 5.5,而我运行的是5.6。我复制了RDS数据库,升级到了5.6,再次运行作业。问题再次出现。
- 两个数据库中的数据量是相同的:我将生产数据库导出并导入到我的本地数据库中,然后运行作业。行为相同(在本地运行速度仍然很快)。
- AWS对RDS节点的监控没有显示任何显着的峰值。CPU使用率从1%跳至最高10%,作业似乎只打开了几个连接(约4个)。
- 我让同事在他们的电脑上运行作业,指向我的MySQL数据库,只是为了确保出色的性能不是因为连接是本地的。它的运行速度与我的电脑上一样快(尽管是通过局域网)。
- 我在RDS上运行作业,既从我的本地电脑运行,又从一个Amazon EC2节点运行,后者与RDS更接近。从EC2上运行时,性能更好,但问题仍然出现。
- 该作业非常并发,每个步骤都有输入和输出通道(缓冲区大小为1000),并且由goroutine执行工作。在步骤之间,我有其他的goroutine对前一个goroutine的输出进行批处理。
- 减速是突然发生的,一个查询只需要几毫秒,而下一个查询需要几十秒的时间。
我对此原因一无所知。欢迎提出任何建议。
英文:
I'm writing a job in go that goes through some MySQL tables, selects some of the rows based on some critera, extracts email addresses from them and sends an email to each.
The filtering process looks at a table (let's call it storage
) which is pretty big (~6gb dumped) and looks like this:
Columns:
id varchar(64) PK
path varchar(64) PK
game varchar(64)
guid varchar(64)
value varchar(512)
timestamp timestamp
There are two indices: (id, path)
(the PK as seen above) and guid
.
The job first retrieves a long list of guids from one table, then batches them and performs consecutive queries like this on the storage
table:
<!-- language: lang-sql -->
SELECT guid, timestamp FROM storage
WHERE game = 'somegame'
AND path = 'path' AND value = 'value' AND timestamp >= '2015-04-22 00:00:00.0' AND timestamp <= '2015-04-29T14:53:07+02:00'
AND guid IN ( ... )
Where the IN
clause contains a list of guids.
I need to retrieve timestamps to be able to filter further.
When running against my local MySQL, everything works as expected, the query takes about 180ms with batches of 1000 guids.
When running against the same DB on Amazon RDS, the queries begin quick, but after some point, they suddenly start taking around 30 seconds, and continue to do so until the job ends.
I have tried many many things to fix this, but can't figure out the reason. Some notes:
- The job uses only one
sql.DB
object. Also, I prepare the above statement once and reuse it quite heavily. - At first, I thought it was because the RDS DB was running MySQL 5.5, and I was running 5.6. I made a replica of the RDS DB, upgraded to 5.6, ran the job again. The problem happened again.
- The volume of data in the two databases is the same: I dumped the production database and imported it into my local database and ran the job. Same behaviour (it still ran quickly locally).
- The AWS monitoring of the RDS nodes doesn't show any significant spikes. The CPU usage jumps from 1% to up to 10%, and the job seems to open just a few connections (~4).
- I had a colleague run the job on their PC, pointing to my MySQL DB, just to make sure the great performance didn't stem from the fact that the connection was local. It ran just as quickly as on my PC (admittedly, over LAN).
- I ran the job against RDS both from my local PC and from an Amazon EC2 node, which is considerably closer to RDS. From EC2, it performed better, but the problem still appeared.
- The job is very concurrent, each step has input and output channels (with buffer sizes of 1000), and the work is performed by goroutines. Between the steps, I have other goroutines that batch the output of the previous goroutine.
- The slowdown is sudden, one query takes milliseconds, and the next one takes tens of seconds.
I haven't the faintest idea why this happens. Any suggestions would be appreciated.
答案1
得分: 0
所以,在大量的实验之后,我找到了解决方案。
我正在使用涉及RDS实例的磁性存储,它保证了大约100 IOPS的速度。这限制了我们查询数据的速度。
我测试了使用2000个预配置的IOPS,并且作业一直以很快的速度运行。
英文:
So, after lots and lots of experimentation, I found the solution.
I am using Magnetic Storage on the RDS instances involved, which guarantees approximately 100 IOPS. This limited the speed with which we could query the data.
I tested using 2000 Provisioned IOPS, and the job ran quickly all the way.
1: http://aws.amazon.com/ebs/details/ "Amazon EBS product page"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论