英文:
Concurrent MySQL writing with GORM leads to an error
问题
我在Golang中实现了一个复杂的CSV导入脚本。
我使用了一个Workerpool实现。在这个Workerpool中,工作线程会处理成千上万个小的CSV文件,对产品进行分类、标记和品牌化。
它们都会写入同一个数据库表。到目前为止一切都很好。
我面临的问题是,如果我选择超过2个工作线程,进程会随机崩溃,并显示以下消息:
工作流程如下:
foreach (csv) {
workerPool.submit(csv)
}
func worker(csv) {
foreach (line) {
import(line)
}
}
import(line) {
product = get(line)
product.category = determine_category(product)
product.brand = determine_brand(product)
save(brand)
product.tags = determine_tags(product)
//and after all
save(product)
}
我尝试将save()调用包装在事务中,但没有帮助。
现在我有以下问题:
- MySQL适合同时保存到一个表吗?
- 如果需要事务来完成这个操作,应该在哪里设置?
- Go SQL驱动程序(错误始终发生在packets.go:1102中)适合这样做吗?
- 有人可以帮助我吗(也许雇佣几个小时)?
我完全陷入了困境。我也可以分享源代码,如果有帮助的话。但我首先想知道你是否认为这是我的代码问题还是一个普遍性的问题。
英文:
I have implemented a complex csv import script in Golang.
I use a Workerpool implementation for it. Inside that workerpool, workers run through 1000s of small csv files, categorizing, tagging and branding the products.
And they all write to the same database table. So far so good.
The problem i'm facing is, that if i chose more than 2 workers, the process crashes with the following message randomly
The workflow is
foreach (csv) {
workerPool.submit(csv)
}
func worker(csv) {
foreach (line) {
import(line)
}
}
import(line) {
product = get(line)
product.category = determine_category(product)
product.brand = determine_brand(product)
save(brand)
product.tags = determine_tags(product)
//and after all
save(product)
}
I tried to wrap the save() calls in transactions, but it didn't help.
Now i have the following questions:
- Is MySQL suited to save concurrently to 1 table?
- If transactions are need to accomplish this, where should they be set?
- Is the Go SQL Driver (where the error ALWAYS happens in packets.go:1102) suited to do this ?
- Could anyone help me (maybe by hiring for a few hours)?
I'm completely stuck. I can also share the sourcecode if that helps. But I first wanted to know i you guess that it's rather my code or a general issue.
答案1
得分: 0
在每个goroutine(或对于使用线程的语言,每个线程)中打开一个新的数据库连接。
MySQL的协议是有状态的,这意味着如果多个goroutine尝试使用同一个连接,请求和响应会变得非常混乱。
你会在尝试在goroutine之间共享任何其他类型的有状态协议连接时遇到相同的问题。
例如,FTP也是一种有状态的协议,这可能更容易理解。一个客户端goroutine可能会发送一条消息,比如“获取文件x”,而响应应该是一系列包含该文件内容的消息。如果另一个goroutine在该请求/响应正在进行时尝试使用同一个连接,两个客户端都会感到困惑。第二个goroutine将读取属于它没有请求的文件的数据包。第一个请求文件的goroutine将发现一些它期望的数据包已经被读取。
类似地,MySQL的协议不支持多个客户端goroutine共享单个连接。
英文:
Open a new db connection in each goroutine (or thread, for languages that use threads).
MySQL's protocol is stateful, which means if multiple goroutines attempt to use the same connection, the requests and responses get very confused.
You would have the same problem trying to share any other kind of stateful protocol connection between goroutines.
For example ftp is also a stateful protocol, and that may be easier to understand. A client goroutine might send a message like "get file x" and the response should be a series of messages containing the content of that file. If another goroutine tries to use the same connection while that request/response is inprogress, both clients will be confused. The second goroutine will read packets that belong to a file it didn't request. The first goroutine who requested the file will find some packets it was expecting have already been read.
Similarly, MySQL's protocol does not support multiple client goroutines sharing a single connection.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论