英文:
golang: goroute with select doesn't stop unless I added a fmt.Print()
问题
我尝试了Go Tour的练习#71。
如果像这样运行go run 71_hang.go ok
,它可以正常工作。
然而,如果使用go run 71_hang.go nogood
,它将永远运行。
唯一的区别是在select
语句的default
中有额外的fmt.Print("")
。
我不确定,但我怀疑是某种无限循环和竞争条件?这是我的解决方案。
注意:这不是死锁,因为Go没有抛出“所有goroutine都处于休眠状态-死锁!”的错误。
package main
import (
"fmt"
"os"
)
type Fetcher interface {
// Fetch返回URL的主体和在该页面上找到的URL的切片。
Fetch(url string) (body string, urls []string, err error)
}
func crawl(todo Todo, fetcher Fetcher,
todoList chan Todo, done chan bool) {
body, urls, err := fetcher.Fetch(todo.url)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("found: %s %q\n", todo.url, body)
for _, u := range urls {
todoList <- Todo{u, todo.depth - 1}
}
}
done <- true
return
}
type Todo struct {
url string
depth int
}
// Crawl使用fetcher递归爬取从url开始的页面,最大深度为depth。
func Crawl(url string, depth int, fetcher Fetcher) {
visited := make(map[string]bool)
doneCrawling := make(chan bool, 100)
toDoList := make(chan Todo, 100)
toDoList <- Todo{url, depth}
crawling := 0
for {
select {
case todo := <-toDoList:
if todo.depth > 0 && !visited[todo.url] {
crawling++
visited[todo.url] = true
go crawl(todo, fetcher, toDoList, doneCrawling)
}
case <-doneCrawling:
crawling--
default:
if os.Args[1] == "ok" { // *
fmt.Print("")
}
if crawling == 0 {
goto END
}
}
}
END:
return
}
func main() {
Crawl("http://golang.org/", 4, fetcher)
}
// fakeFetcher是返回固定结果的Fetcher。
type fakeFetcher map[string]*fakeResult
type fakeResult struct {
body string
urls []string
}
func (f *fakeFetcher) Fetch(url string) (string, []string, error) {
if res, ok := (*f)[url]; ok {
return res.body, res.urls, nil
}
return "", nil, fmt.Errorf("not found: %s", url)
}
// fetcher是一个填充的fakeFetcher。
var fetcher = &fakeFetcher{
"http://golang.org/": &fakeResult{
"The Go Programming Language",
[]string{
"http://golang.org/pkg/",
"http://golang.org/cmd/",
},
},
"http://golang.org/pkg/": &fakeResult{
"Packages",
[]string{
"http://golang.org/",
"http://golang.org/cmd/",
"http://golang.org/pkg/fmt/",
"http://golang.org/pkg/os/",
},
},
"http://golang.org/pkg/fmt/": &fakeResult{
"Package fmt",
[]string{
"http://golang.org/",
"http://golang.org/pkg/",
},
},
"http://golang.org/pkg/os/": &fakeResult{
"Package os",
[]string{
"http://golang.org/",
"http://golang.org/pkg/",
},
},
}
英文:
I tried the Go Tour exercise #71
If it is run like go run 71_hang.go ok
, it works fine.
However, if you use go run 71_hang.go nogood
, it will run forever.
The only difference is the extra fmt.Print("")
in the default
in the select
statement.
I'm not sure, but I suspect some sort of infinite loop and race-condition? And here is my solution.
Note: It's not deadlock as Go didn't throw: all goroutines are asleep - deadlock!
package main
import (
"fmt"
"os"
)
type Fetcher interface {
// Fetch returns the body of URL and
// a slice of URLs found on that page.
Fetch(url string) (body string, urls []string, err error)
}
func crawl(todo Todo, fetcher Fetcher,
todoList chan Todo, done chan bool) {
body, urls, err := fetcher.Fetch(todo.url)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("found: %s %q\n", todo.url, body)
for _, u := range urls {
todoList <- Todo{u, todo.depth - 1}
}
}
done <- true
return
}
type Todo struct {
url string
depth int
}
// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher) {
visited := make(map[string]bool)
doneCrawling := make(chan bool, 100)
toDoList := make(chan Todo, 100)
toDoList <- Todo{url, depth}
crawling := 0
for {
select {
case todo := <-toDoList:
if todo.depth > 0 && !visited[todo.url] {
crawling++
visited[todo.url] = true
go crawl(todo, fetcher, toDoList, doneCrawling)
}
case <-doneCrawling:
crawling--
default:
if os.Args[1]=="ok" { // *
fmt.Print("")
}
if crawling == 0 {
goto END
}
}
}
END:
return
}
func main() {
Crawl("http://golang.org/", 4, fetcher)
}
// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult
type fakeResult struct {
body string
urls []string
}
func (f *fakeFetcher) Fetch(url string) (string, []string, error) {
if res, ok := (*f); ok {
return res.body, res.urls, nil
}
return "", nil, fmt.Errorf("not found: %s", url)
}
// fetcher is a populated fakeFetcher.
var fetcher = &fakeFetcher{
"http://golang.org/": &fakeResult{
"The Go Programming Language",
[]string{
"http://golang.org/pkg/",
"http://golang.org/cmd/",
},
},
"http://golang.org/pkg/": &fakeResult{
"Packages",
[]string{
"http://golang.org/",
"http://golang.org/cmd/",
"http://golang.org/pkg/fmt/",
"http://golang.org/pkg/os/",
},
},
"http://golang.org/pkg/fmt/": &fakeResult{
"Package fmt",
[]string{
"http://golang.org/",
"http://golang.org/pkg/",
},
},
"http://golang.org/pkg/os/": &fakeResult{
"Package os",
[]string{
"http://golang.org/",
"http://golang.org/pkg/",
},
},
}
答案1
得分: 16
在你的select
语句中加入一个default
语句会改变select的工作方式。如果没有default语句,select会阻塞等待通道上的任何消息。如果有default语句,当通道中没有可读取的内容时,select会执行default语句。在你的代码中,我认为这会导致无限循环。加入fmt.Print
语句可以让调度器调度其他goroutine。
如果你将代码改成以下方式,它将正常工作,以非阻塞的方式使用select,允许其他goroutine正常运行。
for {
select {
case todo := <-toDoList:
if todo.depth > 0 && !visited[todo.url] {
crawling++
visited[todo.url] = true
go crawl(todo, fetcher, toDoList, doneCrawling)
}
case <-doneCrawling:
crawling--
}
if crawling == 0 {
break
}
}
如果你使用GOMAXPROCS=2
,你可以使原始代码正常工作,这是另一个提示,说明调度器在一个无限循环中忙碌。
请注意,goroutine是协作调度的。我对你的问题不完全理解的地方是,select
是一个goroutine应该让出执行的点 - 我希望其他人能解释为什么在你的示例中没有这样做。
英文:
Putting a default
statement in your select
changes the way select works. Without a default statement select will block waiting for any messages on the channels. With a default statement select will run the default statement every time there is nothing to read from the channels. In your code I think this makes an infinite loop. Putting the fmt.Print
statement in is allowing the scheduler to schedule other goroutines.
If you change your code like this then it works properly, using select in a non blocking way which allows the other goroutines to run properly.
for {
select {
case todo := <-toDoList:
if todo.depth > 0 && !visited[todo.url] {
crawling++
visited[todo.url] = true
go crawl(todo, fetcher, toDoList, doneCrawling)
}
case <-doneCrawling:
crawling--
}
if crawling == 0 {
break
}
}
You can make your original code work if you use GOMAXPROCS=2 which is another hint that the scheduler is busy in an infinite loop.
Note that goroutines are co-operatively scheduled. What I don't fully understand about your problem is that select
is a point where the goroutine should yield - I hope someone else can explain why it isn't in your example.
答案2
得分: 5
你的CPU负载达到100%是因为几乎总是执行默认情况,导致实际上出现无限循环,因为它一遍又一遍地执行。在这种情况下,Go调度器不会将控制权交给另一个goroutine,这是设计上的。因此,任何其他goroutine都永远没有机会设置crawling != 0
,从而导致无限循环。
我认为你应该删除默认情况,而是在需要使用select语句时创建另一个通道。
否则,runtime包可以帮助你采取一种“脏”的方式:
runtime.GOMAXPROCS(2)
可以工作(或者导出GOMAXPROCS=2),这样你将拥有多个操作系统线程的执行能力- 在Crawl中不时调用
runtime.Gosched()
。即使CPU负载为100%,这将明确地将控制权传递给另一个Goroutine。
编辑:是的,fmt.Printf有所不同的原因是因为它明确地将控制权传递给一些系统调用的内容...
英文:
You have 100% CPU load because almost all times the default case will be executed, resulting effectively in an infinite loop because it's executed over and over again. In this situation the Go scheduler does not hand control to another goroutine, by design. So any other goroutine will never have the opportunity to set crawling != 0
and you have your infinite loop.
In my opinion you should remove the default case and instead create another channel if you want to play with the select statement.
Otherwise the runtime package helps you to go the dirty way:
runtime.GOMAXPROCS(2)
will work (or export GOMAXPROCS=2), this way you will have more than one OS thread of execution- call
runtime.Gosched()
inside Crawl from time to time. Eventhough CPU load is 100%, this will explicitely pass control to another Goroutine.
Edit: Yes, and the reason why fmt.Printf makes a difference: because it explicitely passes control to some syscall stuff...
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论