英文:
producer consumer in golang - concurrency vs parallelism?
问题
我正在处理纯粹使用Golang的后端架构。我有一个API,用于将文件上传到Golang服务器,然后我将文件传输到云存储(从Golang服务器本身)。现在,我希望这两个传输是独立的,这样,终端用户在上传文件后不必等待响应。
终端用户 -> Golang服务器 -> [并发/并行] -> 云存储
现在,我考虑了两种方式:
- 当用户完成上传时,创建一个goroutine,并将文件传输到云端。
- 将文件处理程序插入队列中,不同的进程将读取此队列并将文件传输到云存储(多生产者-单消费者模型)。
我找到了使用goroutine和通道来实现这一点的示例,但我认为这将创建与上传数量相同的goroutine。我想使用第二个选项,但不知道如何在Golang中实现它。
另外,如果我使用了错误的方法或者有其他更高效的方法,请给予建议。
更新
关于需求和限制的详细信息:
- 我正在使用AWS S3作为云存储。如果从Go服务器到Amazon S3的上传失败,文件处理程序应保留以记录上传失败的情况。(我没有给予此优先级,我可能会根据客户的反馈进行更改)
- 一旦成功将文件上传到Amazon S3,将立即从Go服务器中删除该文件,以避免重复上传。此外,如果使用相同名称上传文件,将替换Amazon S3中的文件。
- 如评论中所指出,我可以使用通道作为队列。是否可能使用Go的通道和goroutine设计上述架构?
英文:
I am working on backend architecture which is purely in Golang. I have an API which is used to upload a file to golang server and then I am transferring the file to cloud storage(from the golang server itself). Now, I want both the transfers to be independent, so that, the end user should not has to wait for the response after uploading a file.
End User -> Golang Server ->[Concurrency/Parallelism] -> Cloud Storage
Now, I thought of two ways:
- Create a goroutine as soon as the user finishes the upload and transfer the file to cloud.
- Insert the file handler into a queue, and a different process would read this queue and transfer the file to cloud storage (Multiple producers - Single Consumer model).
I found examples of doing this using goroutine and channels but I think that would create as many goroutines as much there are uploads. I want to use the second option but not able to understand of how to go about it in golang?
Also, do suggest if I am using wrong approach and there is some other efficient method of doing this.
Update
Details about the requirement and constraint:
- I am using AWS S3 as cloud storage. If at some point, the upload from Go server to Amazon S3 fails, the file handler should be kept as in to keep record of the failed upload.(I am not prioritising this, I might change this based on clients feedback)
- The file will be deleted from the Go server as soon as the upload completes successfully to Amazon S3, so as to avoid repetitive uploads. Also, if a file is uploaded with same name, it will be replaced at Amazon S3.
- As pointed out in comments, I can use channel as the queue. Is it possible to design the above architecture using Go's Channels and goroutines?
答案1
得分: 2
用户上传文件时可能会容忍错误,并尝试重新上传。但是,当上传的文件仅存在于上传的机器上,并且在上传到云存储之前发生故障时,存在风险。在这种情况下,文件将丢失,这对用户来说是个打击。
这个问题可以通过良好的架构来解决。这是一个先进先出队列模式。
一个常用的Go实现是go-workers,可能由Redis数据库支持。
假设在任何给定时间都有n台服务器运行您的服务。假设您的后端代码编译了两个单独的二进制文件,一个是服务器二进制文件,一个是工作进程二进制文件。
理想情况下,接受文件上传的机器都会挂载一个共享的网络文件系统,以便:
-
用户将文件上传到服务器
a. 服务器将记录添加到工作队列中,其中包含来自Redis存储的唯一ID。
b. 使用唯一ID创建文件名,并将文件直接从用户上传管道传输到NFS服务器上的临时存储。请注意,文件永远不会驻留在运行服务器的机器的存储上。
-
文件由工作进程上传到云存储
a. 工作进程从工作队列中获取下一个待办记录,该记录具有唯一ID
b. 使用唯一ID在NFS服务器上查找文件,工作进程将文件上传到云存储
c. 成功后,工作进程更新工作队列中的记录以反映成功
d. 工作进程删除NFS服务器上的文件
通过监视服务器流量和工作队列大小作为两个独立的指标,可以确定应该运行多少台服务器来运行服务器/工作进程服务。
英文:
A User uploading a file could tolerate an error, and try again. But the danger exists when an uploaded file exists only on the machine it was uploaded to, and something goes wrong before it gets uploaded to cloud storage. In that case, the file would be lost, and it would be a bummer for the User.
This is solved by good architecture. It's a first-in, first out queue pattern.
A favorite Go implementation of this pattern is go-workers perhaps backed by a Redis database.
Assume there are n number of servers running your service at any given time. Assume that your backend code compiles two separate binaries, a server binary and a worker binary.
Ideally, the machines accepting file uploads would all mount a shared Network File System such that:
- User uploads a file to a server
a. server adds a record into the work queue, which contains a unique ID from the Redis storage.
b. This unique ID is used to create the filename, and the file is piped directly from the User upload to temporary storage on NFS server. Note that the file never resides on the storage of the machine running the server.
- File is uploaded to cloud storage by a worker
a. worker picks up the next to-do record from the work queue, which has a unique ID
b. Using the unique ID to find the file on NFS server, the worker uploads the file to cloud storage
c. When successful, worker updates the record in the work queue to reflect success
d. worker deletes the file on NFS server
By monitoring the server traffic and work queue size as two separate metrics, it can be determined how many servers ought to run the server/worker services respectively.
答案2
得分: 2
Marcio Castilho在类似问题上写了一篇很好的文章。可以在使用golang处理每分钟一百万个请求找到。
他展示了他犯的错误以及他采取的纠正步骤。这是一个学习使用通道、goroutine和并发的好资源。
charneykaye提到的go-workers也是一个很好的资源。
英文:
Marcio Castilho had written a good article on a similar problem. It can be found at Handling one million requests per minutes with golang.
He shows the mistakes that he made and the steps which he took to correct them. Good source to learn the use of channels, goroutines and concurrency in general.
go-workers mentioned by charneykaye is also excellent source.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论