使用Go的工作池实现并行写入文件?

huangapple go评论106阅读模式
英文:

Use go worker pool implementation to write files in parallel?

问题

我有一个切片clientFiles,我正在顺序迭代它,并将其逐个写入S3,如下所示:

  1. for _, v := range clientFiles {
  2. err := writeToS3(v.FileContent, s3Connection, v.FileName, bucketName, v.FolderName)
  3. if err != nil {
  4. fmt.Println(err)
  5. }
  6. }

上述代码运行良好,但我想并行地写入S3,以加快速度。在这里,使用工作池实现是否更好,或者还有其他更好的选择?

我找到了下面的代码,它使用了等待组(wait group),但我不确定这是否是在这里使用的更好选项:

  1. wg := sync.WaitGroup{}
  2. for _, v := range clientFiles {
  3. wg.Add(1)
  4. go func(v ClientMapFile) {
  5. err := writeToS3(v.FileContent, s3Connection, v.FileName, bucketName, v.FolderName)
  6. if err != nil {
  7. fmt.Println(err)
  8. }
  9. }(v)
  10. }
英文:

I have a slice clientFiles which I am iterating over sequentially and writing it in S3 one by one as shown below:

  1. for _, v := range clientFiles {
  2. err := writeToS3(v.FileContent, s3Connection, v.FileName, bucketName, v.FolderName)
  3. if err != nil {
  4. fmt.Println(err)
  5. }
  6. }

The above code works fine but I want to write in S3 in parallel so that I can speed things up. Does worker pool implementation works better here or is there any other better option here?
I got below code which uses wait group but I am not sure if this is better option here to work with?

  1. wg := sync.WaitGroup{}
  2. for _, v := range clientFiles {
  3. wg.Add(1)
  4. go func(v ClientMapFile) {
  5. err := writeToS3(v.FileContent, s3Connection, v.FileName, bucketName, v.FolderName)
  6. if err != nil {
  7. fmt.Println(err)
  8. }
  9. }(v)
  10. }

答案1

得分: 2

是的,并行化应该会有所帮助。

在关于使用WaitGroup的更改后,你的代码应该能够正常工作。你需要在for循环后标记工作为已完成,并等待所有goroutine完成。

  1. var wg sync.WaitGroup
  2. for _, v := range clientFiles {
  3. wg.Add(1)
  4. go func(v ClientMapFile) {
  5. defer wg.Done()
  6. err := writeToS3(v.FileContent, s3Connection, v.FileName, bucketName, v.FolderName)
  7. if err != nil {
  8. fmt.Println(err)
  9. }
  10. }(v)
  11. }
  12. wg.Wait()

请注意,你的解决方案为N个文件创建了N个goroutine,如果文件数量非常大,这可能不是最优的。在这种情况下,可以使用这种模式https://gobyexample.com/worker-pools,并尝试不同数量的工作线程,找出在性能方面最适合你的数量。

英文:

Yes, parallelising should help.

Your code should work well after changes regarding usage of WaitGroup. You need to mark work as Done and wait for all goroutines to finish after for-loop.

  1. var wg sync.WaitGroup
  2. for _, v := range clientFiles {
  3. wg.Add(1)
  4. go func(v ClientMapFile) {
  5. defer wg.Done()
  6. err := writeToS3(v.FileContent, s3Connection, v.FileName, bucketName, v.FolderName)
  7. if err != nil {
  8. fmt.Println(err)
  9. }
  10. }(v)
  11. }
  12. wg.Wait()

Be aware that your solution creates N goroutines for N files, which can be not optimal if number of files is very big. In such case use this pattern https://gobyexample.com/worker-pools and try different number of workers to find which works best for you in terms of performance.

huangapple
  • 本文由 发表于 2022年3月16日 14:07:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/71492455.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定