使用Go中的`database/sql`进行更新操作很慢。

huangapple go评论83阅读模式
英文:

Updating using `database/sql` in Go is slow

问题

我正在针对本地的MySQL实例运行一个更新查询,似乎更新一行需要大约8毫秒的时间。这似乎非常慢。有没有什么方法可以提高性能?

db, _ := sql.Open("mysql", "user:pass@(localhost:3306)/dbname")
db.Exec(`CREATE TABLE topics(
			topic_id                            INT AUTO_INCREMENT PRIMARY KEY,
			title                               VARCHAR(250) NOT NULL,
			content                             TEXT,
			is_sticky                           BOOL NOT NULL DEFAULT false,
			is_readonly                         BOOL NOT NULL DEFAULT false,
			num_comments                        INT NOT NULL DEFAULT 0,
			num_views                           INT NOT NULL DEFAULT 0,
			activity_at                         DATETIME DEFAULT CURRENT_TIMESTAMP,
			archived_at                         DATETIME DEFAULT CURRENT_TIMESTAMP,
			created_at                          DATETIME DEFAULT CURRENT_TIMESTAMP,
			updated_at                          DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);`)
db.Exec("INSERT INTO topics(title) VALUES(?);", "Name")

since := time.Now()
for i := 0; i < 100; i++ {
    db.Exec("UPDATE topics SET num_views = ? WHERE topic_id = 1;", i)
}
fmt.Println(time.Since(since))

在Core i7-4790K上完成100次迭代大约需要850毫秒。我还尝试了使用Postgres进行相同的操作,结果差不多。我创建了这个测试代码,因为我的Web应用程序在每个请求上大约需要10毫秒的时间(每秒250个请求的吞吐量),结果发现更新操作花费了很多时间。

此外,使用预处理语句也没有任何区别。

stmt, _ := db.Prepare("UPDATE topics SET num_views = ? WHERE topic_id = ?;")
since := time.Now()
for i := 0; i < 100; i++ {
	stmt.Exec(i, 1)
}
fmt.Println(time.Since(since))
英文:

I'm running an update query against a local MySQL instance, and it seems to be taking about 8 ms to update a single row. That seemed really slow. Is there anything I can do to improve performance?

db, _ := sql.Open(&quot;mysql&quot;, &quot;user:pass@(localhost:3306)/dbname&quot;)
db.Exec(`CREATE TABLE topics(
			topic_id                            INT AUTO_INCREMENT PRIMARY KEY,
			title                               VARCHAR(250) NOT NULL,
			content                             TEXT,
			is_sticky                           BOOL NOT NULL DEFAULT false,
			is_readonly                         BOOL NOT NULL DEFAULT false,
			num_comments                        INT NOT NULL DEFAULT 0,
			num_views                           INT NOT NULL DEFAULT 0,
			activity_at                         DATETIME DEFAULT CURRENT_TIMESTAMP,
			archived_at                         DATETIME DEFAULT CURRENT_TIMESTAMP,
			created_at                          DATETIME DEFAULT CURRENT_TIMESTAMP,
			updated_at                          DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);`)
db.Exec(&quot;INSERT INTO topics(title) VALUES(?);&quot;, &quot;Name&quot;)

since := time.Now()
for i := 0; i &lt; 100; i++ {
    db.Exec(&quot;UPDATE topics SET num_views = ? WHERE topic_id = 1;&quot;, i)
}
fmt.Println(time.Since(since))

It takes about 850 ms to complete 100 iterations on a Core i7-4790K. I also tried the same thing with Postgres with more or less the same result. I created this test code because my webapp was very slow at about 10 ms per request (250 requests/sec throughput) and it turned out that updates were taking a lot of time.

Also, using prepared statements made no difference.

stmt, _ := db.Prepare(&quot;UPDATE topics SET num_views = ? WHERE topic_id = ?;&quot;)
since := time.Now()
for i := 0; i &lt; 100; i++ {
	stmt.Exec(i, 1)
}
fmt.Println(time.Since(since))

答案1

得分: 3

(从MySQL的角度来说...)

一些“经验法则”:

  • 单个INSERT:10毫秒
  • 通过单个INSERT插入100行或更多行:每行速度快10倍。
  • BEGIN; INSERT...; INSERT...; ... COMMIT;:也是10倍。
  • 以上假设使用HDD;SSD可能快10倍。
  • 如果多个连接都在进行插入操作,它们可能能够并行运行。10个线程可能在相同的时间内完成5倍的工作。(当然,这可能会给应用程序增加不必要的复杂性。)

UPDATE的情况类似,但是使用单个查询对不同的行进行不同的更新并不容易。

你的测试显示,每次逐行更新时,每行需要8.5毫秒。使用BEGIN...COMMIT批量处理100行可能需要大约85毫秒,即使在HDD上也是如此。

某些应用程序适合批量处理,而某些应用程序则不适合。如果你想讨论如何提高MySQL性能,我们需要了解你的应用程序的详细情况。

“Like”和“View”计数器可能需要移动到一个“并行”表中,因为它们往往是逐个更新的,可能会干扰其他活动。它们还往往自动支持多线程,因此每100个计数器的时间远远低于850毫秒。在非常高的活动量(例如每秒超过1K次浏览)中,这些计数器可以通过额外的应用程序代码进行人为批处理。

请重新编写你的基准测试,以反映实际应用程序中将发生的活动。(我猜测更新将并行进行,而不是串行进行。并且它们将在时间上随机分布。)

另外...如果每个“查看计数”都发送到一个Web服务器,那么还会有连接和断开连接的时间;因此,经过的时间很可能超过8.5毫秒。但是“经过的时间”并不是关键问题;真正的问题是“每秒可以执行多少次更新”。

还有一件事...如果你测试“并行”,不要每个请求都命中同一行。那样可能比命中不同行要慢得多。(命中一个随机行会更好。有偏向性地命中某一行会更加真实。)

英文:

(Speaking from a MySQL point of view...)

Some "Rules of Thumb":

  • Single INSERT: 10ms
  • 100 or more rows inserted by a single INSERT: 10 times as fast per row.
  • BEGIN; INSERT...; INSERT...; ... COMMIT; : Also 10x.
  • The above assumes HDD; SSD might be another 10x faster.
  • If multiple connections are each doing inserts, they may be able to run in parallel. 10 threads might be able to do 5 times the work in the same elapsed time. (Of course, this may add unwanted complexity to the app.)

Similar figures for UPDATE, though it is not easy to do different updates on different rows with a single query.

Your test shows 8.5ms per row UPDATEd when doing one row at a time. Batching either with BEGIN...COMMIT will probably take about 85ms for all 100 rows, even on HDD.

Some applications lend themselves to batching; some do not. If you want to talk about improving MySQL performance, we need to get into the details of your application.

"Like" and "View" counters may need to be moved to a 'parallel' table since they tend to be Updated one-at-a-time, with some interference with other activity. They also tend to automatically allow for multi-threading, hence much less than 850ms per 100. In really high activity (over, say, 1K views per second), such counters can be artificially batched via extra app code.

Please rewrite your benchmark to reflect the activity that will happen in the real application. (I am guessing that the Updates will happen in parallel, not serial. And they will be spread out randomly across time.)

Another thing... If each "view count" comes to a web server, then there is also connect and disconnect; hence the elapsed time is likely to be more than 8.5ms. But "elapsed" is not the critical issue; the real issue is "how many updates can be performed per second".)

And another thing... If you test 'parallel', don't hit the same row by each request. That will probably be much slower than if you hit different rows. (Hitting a random row would be better. Having a bias in which row to hit would be even more realistic.)

huangapple
  • 本文由 发表于 2021年9月11日 18:14:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/69142006.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定