使用Go的工作池实现并行写入文件?

huangapple go评论83阅读模式
英文:

Use go worker pool implementation to write files in parallel?

问题

我有一个切片clientFiles,我正在顺序迭代它,并将其逐个写入S3,如下所示:

for _, v := range clientFiles {
  err := writeToS3(v.FileContent, s3Connection, v.FileName, bucketName, v.FolderName)
  if err != nil {
    fmt.Println(err)
  }
}

上述代码运行良好,但我想并行地写入S3,以加快速度。在这里,使用工作池实现是否更好,或者还有其他更好的选择?

我找到了下面的代码,它使用了等待组(wait group),但我不确定这是否是在这里使用的更好选项:

wg := sync.WaitGroup{}
for _, v := range clientFiles {
  wg.Add(1)
  go func(v ClientMapFile) {
    err := writeToS3(v.FileContent, s3Connection, v.FileName, bucketName, v.FolderName)
    if err != nil {
      fmt.Println(err)
    }      
  }(v)
}
英文:

I have a slice clientFiles which I am iterating over sequentially and writing it in S3 one by one as shown below:

for _, v := range clientFiles {
  err := writeToS3(v.FileContent, s3Connection, v.FileName, bucketName, v.FolderName)
  if err != nil {
		fmt.Println(err)
	}
}

The above code works fine but I want to write in S3 in parallel so that I can speed things up. Does worker pool implementation works better here or is there any other better option here?
I got below code which uses wait group but I am not sure if this is better option here to work with?

wg := sync.WaitGroup{}
for _, v := range clientFiles {
  wg.Add(1)
  go func(v ClientMapFile) {
    err := writeToS3(v.FileContent, s3Connection, v.FileName, bucketName, v.FolderName)
    if err != nil {
      fmt.Println(err)
    }      
  }(v)
}

答案1

得分: 2

是的,并行化应该会有所帮助。

在关于使用WaitGroup的更改后,你的代码应该能够正常工作。你需要在for循环后标记工作为已完成,并等待所有goroutine完成。

var wg sync.WaitGroup
for _, v := range clientFiles {
  wg.Add(1)
  go func(v ClientMapFile) {
    defer wg.Done()
    err := writeToS3(v.FileContent, s3Connection, v.FileName, bucketName, v.FolderName)
    if err != nil {
      fmt.Println(err)
    }      
  }(v)
}
wg.Wait()

请注意,你的解决方案为N个文件创建了N个goroutine,如果文件数量非常大,这可能不是最优的。在这种情况下,可以使用这种模式https://gobyexample.com/worker-pools,并尝试不同数量的工作线程,找出在性能方面最适合你的数量。

英文:

Yes, parallelising should help.

Your code should work well after changes regarding usage of WaitGroup. You need to mark work as Done and wait for all goroutines to finish after for-loop.

var wg sync.WaitGroup
for _, v := range clientFiles {
  wg.Add(1)
  go func(v ClientMapFile) {
    defer wg.Done()
    err := writeToS3(v.FileContent, s3Connection, v.FileName, bucketName, v.FolderName)
    if err != nil {
      fmt.Println(err)
    }      
  }(v)
}
wg.Wait()

Be aware that your solution creates N goroutines for N files, which can be not optimal if number of files is very big. In such case use this pattern https://gobyexample.com/worker-pools and try different number of workers to find which works best for you in terms of performance.

huangapple
  • 本文由 发表于 2022年3月16日 14:07:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/71492455.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定