英文:
Processing a channel concurrently results in unexpected output
问题
我有一个非缓冲通道,有i
个工作器从中获取一个值(文件系统路径)并处理它(通过HTTP发送文件内容)。当我增加i
时,遇到了问题。
当我运行以下代码时:
paths := make(chan string)
for i := 0; i < 5; i++ {
go func() {
for path := range paths {
fmt.Println(path)
}
}()
}
walkFn := func(path string, info os.FileInfo, err error) error {
if !info.IsDir() {
paths <- path
}
return nil
}
filepath.Walk("/tmp/foo", walkFn)
close(paths)
它按预期工作,并输出/tmp/foo
目录下的所有内容:
/tmp/foo/2
/tmp/foo/file9
/tmp/foo/file91
/tmp/foo/file90
/tmp/foo/file900
/tmp/foo/file901
/tmp/foo/file902
/tmp/foo/file92
/tmp/foo/file97
/tmp/foo/file93
/tmp/foo/file94
/tmp/foo/file95
/tmp/foo/file96
/tmp/foo/file98
/tmp/foo/file99
但是当我通过HTTP发送文件内容时,受影响的文件数量突然减少了:
for i := 0; i < 5; i++ {
go func() {
for path := range paths {
resp, err := http.Head("https://example.com/" + strings.TrimPrefix(path, rootDir+"/"))
if err != nil {
fmt.Printf("Error: %s\n", err)
return
}
fmt.Printf("%s: %s\n", path, resp.Status)
}
}()
}
受影响的文件数量从15(目录中存在的文件数量)减少到10:
/tmp/foo/2: 404 Not Found
/tmp/foo/file901: 404 Not Found
/tmp/foo/file900: 404 Not Found
/tmp/foo/file9: 404 Not Found
/tmp/foo/file90: 404 Not Found
/tmp/foo/file902: 404 Not Found
/tmp/foo/file91: 404 Not Found
/tmp/foo/file92: 404 Not Found
/tmp/foo/file93: 404 Not Found
/tmp/foo/file94: 404 Not Found
下表显示了i
的值与输出行数之间的关系:
+-----+-------+
| `i` | lines |
+-----+-------+
| 1 | 15 |
| 5 | 10 |
| 6 | 9 |
| 15 | 0 |
+-----+-------+
为什么会发生这种情况,我如何同时处理所有通道条目?这与http
请求有关吗?
英文:
I have an unbuffered channel that i
amount of workers take a value from (a filesystem path) and process it (send the file contents over HTTP). I'm running into problem when I increase i
.
When I run this:
paths := make(chan string)
for i := 0; i < 5; i++ {
go func() {
for path := range paths {
fmt.Println(path)
}
}()
}
walkFn := func(path string, info os.FileInfo, err error) error {
if !info.IsDir() {
paths <- path
}
return nil
}
filepath.Walk("/tmp/foo", walkFn)
close(paths)
It works expectedly and outputs all the contents of /tmp/foo
:
/tmp/foo/2
/tmp/foo/file9
/tmp/foo/file91
/tmp/foo/file90
/tmp/foo/file900
/tmp/foo/file901
/tmp/foo/file902
/tmp/foo/file92
/tmp/foo/file97
/tmp/foo/file93
/tmp/foo/file94
/tmp/foo/file95
/tmp/foo/file96
/tmp/foo/file98
/tmp/foo/file99
But when I send the file contents over HTTP, the number of affected files suddenly goes down:
for i := 0; i < 5; i++ {
go func() {
for path := range paths {
resp, err := http.Head("https://example.com/" + strings.TrimPrefix(path, rootDir+"/"))
if err != nil {
fmt.Printf("Error: %s\n", err)
return
}
fmt.Printf("%s: %s\n", path, resp.Status)
}
}()
}
the number of affected files reduces from 15 (which is how many exist in the directory), down to 10:
/tmp/foo/2: 404 Not Found
/tmp/foo/file901: 404 Not Found
/tmp/foo/file900: 404 Not Found
/tmp/foo/file9: 404 Not Found
/tmp/foo/file90: 404 Not Found
/tmp/foo/file902: 404 Not Found
/tmp/foo/file91: 404 Not Found
/tmp/foo/file92: 404 Not Found
/tmp/foo/file93: 404 Not Found
/tmp/foo/file94: 404 Not Found
Here's a table that relates the value of i
to the number of output lines:
+-----+-------+
| `i` | lines |
+-----+-------+
| 1 | 15 |
| 5 | 10 |
| 6 | 9 |
| 15 | 0 |
+-----+-------+
Why does this happen and how can I process all the channel entries concurrently? Is it a problem with http
requests?
答案1
得分: 1
问题是在这行代码之后:
filepath.Walk("/tmp/foo", walkFn)
所有的路径都通过paths
通道发送了出去,这意味着有人接收到了这些路径。然而,这并不意味着那些接收路径的goroutine已经完全执行完毕。
所以当你的程序在close(paths)
之后退出时,仍然有一些goroutine在工作,它们会被终止,因为main
函数已经执行完毕。
https://golang.org/ref/spec#Program_execution
程序的执行从初始化主包开始,然后调用
main
函数。当该函数调用返回时,程序退出。它不会等待其他(非主)goroutine完成。
一个简单的解决方案是在程序的末尾添加
select{}
这将使程序永远阻塞。
英文:
The problem is that after this line:
filepath.Walk("/tmp/foo", walkFn)
All paths have been sent through the paths
channel, this implies that someone received them. However, it does not imply that those receiving goroutines have finished completely.
So when your program exits after close(paths)
, there are still goroutines working and they get killed because main
is finished.
https://golang.org/ref/spec#Program_execution
> Program execution begins by initializing the main package and then invoking the function main. When that function invocation returns, the program exits. It does not wait for other (non-main) goroutines to complete.
One simple solution is to add
select{}
at the end of your program. This will make it block forever.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论