英文:
How to make an api call faster in Golang?
问题
我正在尝试使用公司提供的 API 将一堆文件上传到他们提供的存储服务(基本上是我的账户)。我有很多文件,大约有40-50个左右。
我获取了文件的完整路径,并利用 os.Open
方法,这样我就可以传递 io.Reader
。我尝试使用 client.Files.Upload()
方法而没有使用 goroutines
,但上传这些文件花费了很长时间,所以我决定使用 goroutines
。以下是我尝试的实现。当我运行程序时,它只上传一个文件,这个文件可能是大小最小的文件,或者是等待时间最长的文件。它有什么问题吗?难道每次循环运行时都会创建一个 goroutine
并继续执行其循环,并为每个文件创建一个 goroutine
吗?如何通过使用 goroutines
来使它尽可能快?
var filePaths []string
var wg sync.WaitGroup
// 填充文件路径的字符串切片
func fill() {
filepath.Walk(rootpath, func(path string, info os.FileInfo, err error) error {
if !info.IsDir() {
filePaths = append(filePaths, path)
}
if err != nil {
fmt.Println("ERROR:", err)
}
return nil
})
}
func main() {
fill()
tokenSource := oauth2.StaticTokenSource(&oauth2.Token{AccessToken: token})
oauthClient := oauth2.NewClient(context.TODO(), tokenSource)
client := putio.NewClient(oauthClient)
for _, path := range filePaths {
wg.Add(1)
go func(path string) {
defer wg.Done()
f, err := os.Open(path)
if err != nil {
log.Println("err:OPEN", err)
}
upload, err := client.Files.Upload(context.TODO(), f, path, 0)
if err != nil {
log.Println("error uploading file:", err)
}
fmt.Println(upload)
}(path)
}
wg.Wait()
}
在你的代码中,你需要将 path
参数传递给匿名函数,以便每个 goroutine
都能正确地访问它。这样,每个 goroutine
都会处理不同的文件。另外,确保在每个 goroutine
结束时调用 wg.Done()
,以便 WaitGroup
可以正确地等待所有 goroutine
完成。这样修改后,你的程序应该能够以尽可能快的速度使用 goroutines
进行文件上传。
英文:
I am trying to upload bunch of files using the company's api to the storage service they provide. (basically to my account). I have got lots of files like 40-50 or something.
I got the full path of the files and utilize the os.Open
, so that, I can pass the io.Reader. I did try to use client.Files.Upload()
without goroutines
but it took so much time to upload them and decided to use goroutines
. Here the implementation that I tried. When I run the program it just uploads one file which is the one that has the lowest size or something that it waits for a long time. What is wrong with it? Is it not like every time for loops run it creates a goroutine
continue its cycle and creates for every file
? How to make it as fast as possible with goroutines
?
var filePaths []string
var wg sync.WaitGroup
// fills the string of slice with fullpath of files.
func fill() {
filepath.Walk(rootpath, func(path string, info os.FileInfo, err error) error {
if !info.IsDir() {
filePaths = append(filePaths, path)
}
if err != nil {
fmt.Println("ERROR:", err)
}
return nil
})
}
func main() {
fill()
tokenSource := oauth2.StaticTokenSource(&oauth2.Token{AccessToken: token})
oauthClient := oauth2.NewClient(context.TODO(), tokenSource)
client := putio.NewClient(oauthClient)
for _, path := range filePaths {
wg.Add(1)
go func() {
defer wg.Done()
f, err := os.Open(path)
if err != nil {
log.Println("err:OPEN", err)
}
upload, err := client.Files.Upload(context.TODO(), f, path, 0)
if err != nil {
log.Println("error uploading file:", err)
}
fmt.Println(upload)
}()
}
wg.Wait()
}
答案1
得分: 1
考虑一个类似这样的工作池模式:https://go.dev/play/p/p6SErj3L6Yc
在这个示例应用程序中,我已经去掉了API调用,只列出了文件名。这样可以在playground上运行。
- 启动了固定数量的工作goroutine。我们将使用一个通道来分发它们的工作,并通过关闭通道来通信工作的结束。这个数字可以是1个或1000个例程,或者更多。选择这个数字应该基于你的putio API可以合理支持多少并发API操作。
paths
是我们将用于此目的的chan string
。- 工作goroutine通过
paths
通道遍历以接收新的文件路径进行上传。
这个模式可以处理无限数量的文件,而无需在处理之前将整个列表加载到内存中。正如你所看到的,这并不会使代码变得更复杂 - 实际上,它更简单。
当我运行程序时,它只上传了一个文件,就是这个文件。
函数字面值继承了它们所在范围的作用域。这就是为什么我们的代码只列出了一个路径 - 在for循环中的path
变量作用域被共享给每个goroutine,所以当该变量改变时,所有的例程都会接收到变化。
除非你真的想要继承作用域,否则应避免使用函数字面值。在全局作用域定义的函数不会继承任何作用域,你必须将所有相关变量传递给这些函数。这是一件好事 - 它使函数更容易理解,并使变量的“所有权”转换更明确。
使用函数字面值的一个适当情况可能是os.Walk
参数;它的参数由os.Walk
定义,所以定义作用域是访问其他值的一种方式 - 在我们的例子中,就是paths
通道。
谈到作用域,应该避免使用全局变量,除非它们的使用范围真的是全局的。最好在函数之间传递变量,而不是共享全局变量。同样,这使得变量的所有权明确,并且易于理解哪些函数访问哪些变量。你的等待组和filePaths
都没有理由成为全局变量。
不要忘记关闭你打开的任何文件。当你处理40或50个文件时,让所有这些打开的文件句柄堆积直到程序结束并不是太糟糕,但是当文件数量超过允许打开文件的ulimit
时,这是一个定时炸弹。因为函数执行远远超过需要打开文件的部分,所以在这种情况下,defer
没有意义。我会在上传文件后使用显式的f.Close()
。
英文:
Consider a worker pool pattern like this: https://go.dev/play/p/p6SErj3L6Yc
In this example application, I've taken out the API call and just list the file names. That makes it work on the playground.
- A fixed number of worker goroutines are started. We'll use a channel to distribute their work and we'll close the channel to communicate the end of the work. This number could be 1 or 1000 routines, or more. The number should be chosen based on how many concurrent API operations your putio API can reasonably be expected to support.
paths
is achan string
we'll use for this purpose.- workers
range
overpaths
channel to receive new file paths to upload
package main
import (
"fmt"
"os"
"path/filepath"
"sync"
)
func main() {
paths := make(chan string)
var wg = new(sync.WaitGroup)
for i := 0; i < 10; i++ {
wg.Add(1)
go worker(paths, wg)
}
if err := filepath.Walk("/usr", func(path string, info os.FileInfo, err error) error {
if err != nil {
return fmt.Errorf("Failed to walk directory: %T %w", err, err)
}
if info.IsDir() {
return nil
}
paths <- path
return nil
}); err != nil {
panic(fmt.Errorf("failed Walk: %w", err))
}
close(paths)
wg.Wait()
}
func worker(paths <-chan string, wg *sync.WaitGroup) {
defer wg.Done()
for path := range paths {
// do upload.
fmt.Println(path)
}
}
This pattern can handle an indefinitely large amount of files without having to load the entire list in memory before processing it. As you can see, this doesn't make the code more complicated - actually, it's simpler.
> When I run the program it just uploads one file which is the one
Function literals inherit the scope in which they were defined. This is why our code only listed one path - the path
variable scope in the for loop was shared to each go routine, so when that variable changed, all routines picked up the change.
Avoid function literals unless you actually want to inherit scope. Functions defined at the global scope don't inherit any scope, and you must pass all relevant variables to those functions instead. This is a good thing - it makes the functions more straightforward to understand and makes variable "ownership" transitions more explicit.
An appropriate case to use a function literal could be for the os.Walk
parameter; its arguments are defined by os.Walk
so definition scope is one way to access other values - such as paths
channel, in our case.
Speaking of scope, global variables should be avoided unless their scope of usage is truly global. Prefer passing variables between functions to sharing global variables. Again, this makes variable ownership explicit and makes it easy to understand which functions do and don't access which variables. Neither your wait group nor your filePaths
have any cause to be global.
f, err := os.Open(path)
Don't forget to close any files you open. When you're dealing with 40 or 50 files, letting all those open file handles pile up until the program ends isn't so bad, but it's a time bomb in your program that will go off when the number of files exceeds the ulimit
of allowed open files. Because the function execution greatly exceeds the part where the file needs to be open, defer
doesn't make sense in this case. I would use an explicit f.Close()
after uploading the file.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论