golang sync.WaitGroup永远不会完成

huangapple go评论88阅读模式
英文:

golang sync.WaitGroup never completes

问题

我有以下代码,它获取一系列URL,并有条件地下载文件并保存到文件系统中。文件是并发获取的,主goroutine等待所有文件都被获取完毕。但是,程序在完成所有请求后永远不会退出(也没有错误)。

我认为问题出在WaitGroup中的goroutine数量要么一开始就被增加了太多次(通过Add),要么没有被减少足够次(没有调用Done)。

我是否明显地做错了什么?我该如何检查当前有多少个goroutine在WaitGroup中,以便更好地调试发生了什么?

package main

import (
	"fmt"
	"io"
	"io/ioutil"
	"net/http"
	"os"
	"strings"
	"sync"
)

func main() {
	links := parseLinks()

	var wg sync.WaitGroup

	for _, url := range links {
		if isExcelDocument(url) {
			wg.Add(1)
			go downloadFromURL(url, &wg)
		} else {
			fmt.Printf("Skipping: %v \n", url)
		}
	}
	wg.Wait()
}

func downloadFromURL(url string, wg *sync.WaitGroup) error {
	tokens := strings.Split(url, "/")
	fileName := tokens[len(tokens)-1]
	fmt.Printf("Downloading %v to %v \n", url, fileName)

	content, err := os.Create("temp_docs/" + fileName)
	if err != nil {
		fmt.Printf("Error while creating %v because of %v", fileName, err)
		return err
	}

	resp, err := http.Get(url)
	if err != nil {
		fmt.Printf("Could not fetch %v because %v", url, err)
		return err
	}
	defer resp.Body.Close()

	_, err = io.Copy(content, resp.Body)
	if err != nil {
		fmt.Printf("Error while saving %v from %v", fileName, url)
		return err
	}

	fmt.Printf("Download complete for %v \n", fileName)

	defer wg.Done()
	return nil
}

func isExcelDocument(url string) bool {
	return strings.HasSuffix(url, ".xlsx") || strings.HasSuffix(url, ".xls")
}

func parseLinks() []string {
	linksData, err := ioutil.ReadFile("links.txt")
	if err != nil {
		fmt.Printf("Trouble reading file: %v", err)
	}

	links := strings.Split(string(linksData), ", ")

	return links
}

希望对你有帮助!

英文:

I have the below code that fetches a list of URL's and then conditionally downloads a file and saves it to the filesystem. The files are fetched concurrently and the main goroutine waits for all the files to be fetched. But, the program never exits (and there are no errors) after completing all the requests.

What I think is happening is that somehow the amount of go routines in the WaitGroup is either incremented by too many to begin with (via Add) or not decremented by enough (a Done call is not happening).

Is there something I am obviously doing wrong? How would I inspect how many go routines are presently in the WaitGroup so I can better debug what's happening?

package main
import (
"fmt"
"io"
"io/ioutil"
"net/http"
"os"
"strings"
"sync"
)
func main() {
links := parseLinks()
var wg sync.WaitGroup
for _, url := range links {
if isExcelDocument(url) {
wg.Add(1)
go downloadFromURL(url, wg)
} else {
fmt.Printf("Skipping: %v \n", url)
}
}
wg.Wait()
}
func downloadFromURL(url string, wg sync.WaitGroup) error {
tokens := strings.Split(url, "/")
fileName := tokens[len(tokens)-1]
fmt.Printf("Downloading %v to %v \n", url, fileName)
content, err := os.Create("temp_docs/" + fileName)
if err != nil {
fmt.Printf("Error while creating %v because of %v", fileName, err)
return err
}
resp, err := http.Get(url)
if err != nil {
fmt.Printf("Could not fetch %v because %v", url, err)
return err
}
defer resp.Body.Close()
_, err = io.Copy(content, resp.Body)
if err != nil {
fmt.Printf("Error while saving %v from %v", fileName, url)
return err
}
fmt.Printf("Download complete for %v \n", fileName)
defer wg.Done()
return nil
}
func isExcelDocument(url string) bool {
return strings.HasSuffix(url, ".xlsx") || strings.HasSuffix(url, ".xls")
}
func parseLinks() []string {
linksData, err := ioutil.ReadFile("links.txt")
if err != nil {
fmt.Printf("Trouble reading file: %v", err)
}
links := strings.Split(string(linksData), ", ")
return links
}

答案1

得分: 43

这段代码有两个问题。首先,你需要将指向WaitGroup的指针传递给downloadFromURL(),否则该对象将被复制,Done()main()中将不可见。

请参考以下代码:

func main() {
    ...
    go downloadFromURL(url, &wg)
    ...
}

其次,在downloadFromURL()中,defer wg.Done()应该是函数中的第一条语句之一,否则如果在该语句之前从函数返回,它将不会被“注册”并且不会被调用。

func downloadFromURL(url string, wg *sync.WaitGroup) error {
    defer wg.Done()
    ...
}
英文:

There are two problems with this code. First, you have to pass a pointer to the WaitGroup to downloadFromURL(), otherwise the object will be copied and Done() will not be visible in main().

See:

func main() {
...
go downloadFromURL(url, &wg)
...
}

Second, defer wg.Done() should be one of the first statements in downloadFromURL(), otherwise if you return from the function before that statement, it won't get "registered" and won't get called.

func downloadFromURL(url string, wg *sync.WaitGroup) error {
defer wg.Done()
...
}

答案2

得分: 4

在Go语言中,参数始终按值传递。当参数可能被修改时,请使用指针。此外,请确保始终执行wg.Done()。例如,

package main

import (
	"fmt"
	"io"
	"io/ioutil"
	"net/http"
	"os"
	"strings"
	"sync"
)

func main() {
	links := parseLinks()

	wg := new(sync.WaitGroup)

	for _, url := range links {
		if isExcelDocument(url) {
			wg.Add(1)
			go downloadFromURL(url, wg)
		} else {
			fmt.Printf("Skipping: %v \n", url)
		}
	}
	wg.Wait()
}

func downloadFromURL(url string, wg *sync.WaitGroup) error {
	defer wg.Done()
	tokens := strings.Split(url, "/")
	fileName := tokens[len(tokens)-1]
	fmt.Printf("Downloading %v to %v \n", url, fileName)

	content, err := os.Create("temp_docs/" + fileName)
	if err != nil {
		fmt.Printf("Error while creating %v because of %v", fileName, err)
		return err
	}

	resp, err := http.Get(url)
	if err != nil {
		fmt.Printf("Could not fetch %v because %v", url, err)
		return err
	}
	defer resp.Body.Close()

	_, err = io.Copy(content, resp.Body)
	if err != nil {
		fmt.Printf("Error while saving %v from %v", fileName, url)
		return err
	}

	fmt.Printf("Download complete for %v \n", fileName)

	return nil
}

func isExcelDocument(url string) bool {
	return strings.HasSuffix(url, ".xlsx") || strings.HasSuffix(url, ".xls")
}

func parseLinks() []string {
	linksData, err := ioutil.ReadFile("links.txt")
	if err != nil {
		fmt.Printf("Trouble reading file: %v", err)
	}

	links := strings.Split(string(linksData), ", ")

	return links
}

以上是一个示例代码,展示了如何在Go语言中使用指针和并发下载文件。

英文:

Arguments in Go are always passed by value. Use a pointer when an argument may be modified. Also, make sure that you always execute wg.Done().For example,

package main
import (
"fmt"
"io"
"io/ioutil"
"net/http"
"os"
"strings"
"sync"
)
func main() {
links := parseLinks()
wg := new(sync.WaitGroup)
for _, url := range links {
if isExcelDocument(url) {
wg.Add(1)
go downloadFromURL(url, wg)
} else {
fmt.Printf("Skipping: %v \n", url)
}
}
wg.Wait()
}
func downloadFromURL(url string, wg *sync.WaitGroup) error {
defer wg.Done()
tokens := strings.Split(url, "/")
fileName := tokens[len(tokens)-1]
fmt.Printf("Downloading %v to %v \n", url, fileName)
content, err := os.Create("temp_docs/" + fileName)
if err != nil {
fmt.Printf("Error while creating %v because of %v", fileName, err)
return err
}
resp, err := http.Get(url)
if err != nil {
fmt.Printf("Could not fetch %v because %v", url, err)
return err
}
defer resp.Body.Close()
_, err = io.Copy(content, resp.Body)
if err != nil {
fmt.Printf("Error while saving %v from %v", fileName, url)
return err
}
fmt.Printf("Download complete for %v \n", fileName)
return nil
}
func isExcelDocument(url string) bool {
return strings.HasSuffix(url, ".xlsx") || strings.HasSuffix(url, ".xls")
}
func parseLinks() []string {
linksData, err := ioutil.ReadFile("links.txt")
if err != nil {
fmt.Printf("Trouble reading file: %v", err)
}
links := strings.Split(string(linksData), ", ")
return links
}

答案3

得分: 1

如@Bartosz所提到的,您需要传递对WaitGroup对象的引用。他很好地讨论了defer ws.Done()的重要性。

我喜欢WaitGroup的简洁性。然而,我不喜欢我们需要传递对goroutine的引用,因为这意味着并发逻辑将与业务逻辑混合在一起。

所以我想出了这个通用函数来解决这个问题:

// Parallelize并行化函数调用
func Parallelize(functions ...func()) {
var waitGroup sync.WaitGroup
waitGroup.Add(len(functions))

defer waitGroup.Wait()
for _, function := range functions {
go func(copy func()) {
defer waitGroup.Done()
copy()
}(function)
}

}

所以您的示例可以这样解决:

func main() {
links := parseLinks()

functions := []func(){}
for _, url := range links {
if isExcelDocument(url) {
function := func(url string){
return func() { downloadFromURL(url) }
}(url)
functions = append(functions, function)
} else {
fmt.Printf("Skipping: %v \n", url)
}
}
Parallelize(functions...)

}

func downloadFromURL(url string) {
...
}

如果您想使用它,您可以在这里找到它https://github.com/shomali11/util

英文:

As @Bartosz mentioned, you will need to pass a reference to your WaitGroup object. He did a great job discussing the importance of defer ws.Done()

I like WaitGroup's simplicity. However, I do not like that we need to pass the reference to the goroutine because that would mean that the concurrency logic would be mixed with your business logic.

So I came up with this generic function to solve this problem for me:

// Parallelize parallelizes the function calls
func Parallelize(functions ...func()) {
var waitGroup sync.WaitGroup
waitGroup.Add(len(functions))
defer waitGroup.Wait()
for _, function := range functions {
go func(copy func()) {
defer waitGroup.Done()
copy()
}(function)
}
}

So your example could be solved this way:

func main() {
links := parseLinks()
functions := []func(){}
for _, url := range links {
if isExcelDocument(url) {
function := func(url string){
return func() { downloadFromURL(url) }
}(url)
functions = append(functions, function)
} else {
fmt.Printf("Skipping: %v \n", url)
}
}
Parallelize(functions...)
}
func downloadFromURL(url string) {
...
}

If you would like to use it, you can find it here https://github.com/shomali11/util

huangapple
  • 本文由 发表于 2015年1月12日 07:26:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/27893304.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定