英文:
Deadlock in book <The Go Programming Language>, how it would happen and why it happen?
问题
在这本书中有几次谈到了关于goroutine
和channel
的错误使用导致的死锁
问题,我没有理解为什么会发生死锁。
首先,我想说的是,我知道通道的发送和接收操作会阻塞,直到有数据可读或有空间可以发送数据,也许这就是一些死锁
的深层原因。但是请根据书中的以下摘录给我一些解释:
第240页
这段代码是以BFS方式并发爬取URL:
func main() {
worklist := make(chan []string)
// 从命令行参数开始。
go func() { worklist <- os.Args[1:] }()
// 并发地爬取网页。
seen := make(map[string]bool)
for list := range worklist {
for _, link := range list {
if !seen[link] {
seen[link] = true
go func(link string) {
worklist <- crawl(link)
}(link)
}
}
}
}
引用书中的第二段:
> ...将命令行参数发送到工作列表的初始发送必须在其自己的goroutine中运行,以避免死锁,即主goroutine和爬虫goroutine都试图相互发送而没有接收的情况...
假设初始发送到worklist
的操作不在自己的goroutine中,我想象它的工作方式如下:
- 主goroutine发送初始数据到
worklist
,阻塞直到接收到数据 for range
循环接收初始数据,因此解除worklist
通道的阻塞- 爬虫goroutine将其数据发送到
worklist
,循环...
所以我理解,它不会阻塞和死锁。我错在哪里?
更新:@mkopriva帮助我意识到由于步骤1被阻塞,步骤2和3是无法到达的。所以我对这个问题很清楚了。
第243页
这个死锁情况可能与第240页相同:
func main() {
worklist := make(chan []string) // URL列表,可能有重复
unseenLinks := make(chan string) // 去重后的URL
// 将命令行参数添加到工作列表。
go func() { worklist <- os.Args[1:] }()
// 创建20个爬虫goroutine来获取每个未见过的链接。
for i := 0; i < 20; i++ {
go func() {
for link := range unseenLinks {
foundLinks := crawl(link)
go func() { worklist <- foundLinks }()
}
}()
}
// 主goroutine对工作列表项进行去重
// 并将未见过的项发送给爬虫。
seen := make(map[string]bool)
for list := range worklist {
for _, link := range list {
if !seen[link] {
seen[link] = true
unseenLinks <- link
}
}
}
}
所以如果我在for-range
循环中省略go
关键字,死锁会发生吗?
英文:
There are several times in this book talk about deadlock
regarding to incorrect usage of goroutine
and channel
, and I failed to grasp why the deadlock happen.
First thing I want to say is that, I know channel send&receive will block until there is items to be read or rooms to send items into, and maybe that's the deep reason of some deadlock
. But please enlighten me with some explanation according to the following excerpt from the book:
Page 240
This code is to concurrently crawl urls, BFS style:
func main() {
worklist := make(chan []string)
// Start with the command-line arguments.
go func() { worklist <- os.Args[1:] }()
// Crawl the web concurrently.
seen := make(map[string]bool)
for list := range worklist {
for _, link := range list {
if !seen[link] {
seen[link] = true
go func(link string) {
worklist <- crawl(link)
}(link)
}
}
}
}
quoting book's second paragraph:
> ...the initial send of the command-line arguments to the worklist must run in its own goroutine to avoid deadlock, a stuck situation in which both the main goroutine and a crawler goroutine attempt to send to each other while neither is receiving...
suppose the initial send to worklist
is not in its own goroutin, I imagine it works like this:
- main goroutine send initial to
worklist
, block until received - the
for range
receive the initial item, so unblock theworklist
channel - the crawler goroutine send its items into
worklist
, loop...
So to my understanding, it won't block and deadlock. Where am I wrong?
UPDATE: @mkopriva helped me realized since step 1 is blocked, step 2,3 is unreachable. So I'm clear on this one.
Page 243
This deadlock situation might be the same as in page 240:
func main() {
worklist := make(chan []string) // list of URLs, may have duplicates
unseenLinks := make(chan string) // de-duplicated URLs
// Add command-lin arguments to worklist.
go func() { worklist <- os.Args[1:] }()
// Create 20 crawler goroutines to fetch each unseen link.
for i := 0; i < 20; i++ {
go func() {
for link := range unseenLinks {
foundLinks := crawl(link)
go func() { worklist <- foundLinks }()
}
}()
}
// The main goroutine de-duplicates worklist items
// and sends the unseen ones to the crawlers.
seen := make(map[string]bool)
for list := range worklist {
for _, link := range list {
if !seen[link] {
seen[link] = true
unseenLinks <- link
}
}
}
}
So if I put omit go
inside the for-range
loop, how the deadlock happen?
答案1
得分: 3
在第一个代码片段中,初始通道发送需要在goroutine中完成,因为如果没有goroutine,该语句将无限期地阻塞,并且执行永远不会到达从该通道接收的循环。也就是说,要从1.
到达2.
,需要在goroutine中完成1.
。然而,如果1.
阻塞,那么永远不会到达2.
。
注释开始的地方是执行停止的地方:
func main() {
worklist := make(chan []string)
worklist <- os.Args[1:]
//
// seen := make(map[string]bool)
// for list := range worklist {
// for _, link := range list {
// if !seen {
// seen = true
// go func(link string) {
// worklist <- crawl(link)
// }(link)
// }
// }
// }
// }
在第二个代码片段中,你有一个对通道的for-range
循环,这样的循环直到遍历的通道关闭才会退出。这意味着,虽然这样的循环的主体可能继续执行,但是循环未关闭的通道之后的代码将永远不会被执行到。
https://golang.org/ref/spec#For_range
> 4. 对于通道,产生的迭代值是连续发送到通道上的值,直到通道被关闭。如果通道为nil,则range表达式将永远阻塞。
注释开始的地方是执行停止的地方:
func main() {
worklist := make(chan []string)
unseenLinks := make(chan string)
go func() { worklist <- os.Args[1:] }()
for i := 0; i < 20; i++ {
for link := range unseenLinks {
// foundLinks := crawl(link)
// go func() { worklist <- foundLinks }()
// }
// }
//
// seen := make(map[string]bool)
// for list := range worklist {
// for _, link := range list {
// if !seen {
// seen = true
// unseenLinks <- link
// }
// }
// }
// }
英文:
In the first snippet, the initial channel send needs to be done in a goroutine because without a goroutine the statement would block indefinitely and the execution would never reach the loop that receives from that channel. i.e. To get from 1.
to 2.
, 1.
needs to be done in a goroutine. If 1.
blocks however, then 2.
is never reached.
Where the comments start is where the execution stops:
func main() {
worklist := make(chan []string)
worklist <- os.Args[1:]
//
// seen := make(map[string]bool)
// for list := range worklist {
// for _, link := range list {
// if !seen {
// seen = true
// go func(link string) {
// worklist <- crawl(link)
// }(link)
// }
// }
// }
// }
In the second snippet you have a for-range
loop over a channel, such a loop will NOT exit until the ranged-over channel is closed. That means that, while the body of such a loop may continue to get executed, the code after the loop-with-unclosed-channel will never be reached.
https://golang.org/ref/spec#For_range
> 4. For channels, the iteration values produced are the successive values sent on the channel until the channel is closed. If the channel
> is nil, the range expression blocks forever.
Where the comments start is where the execution stops:
func main() {
worklist := make(chan []string)
unseenLinks := make(chan string)
go func() { worklist <- os.Args[1:] }()
for i := 0; i < 20; i++ {
for link := range unseenLinks {
// foundLinks := crawl(link)
// go func() { worklist <- foundLinks }()
// }
// }
//
// seen := make(map[string]bool)
// for list := range worklist {
// for _, link := range list {
// if !seen {
// seen = true
// unseenLinks <- link
// }
// }
// }
// }
答案2
得分: 0
在第二个代码片段中,我认为作者在谈论下面的go例程:
go func() { worklist <- foundLinks }()
通过专用的goroutine将crawl发现的链接发送到worklist,以避免死锁。
为什么必须存在这个go例程?
英文:
In the second snippet, I think the author is talking about the go routine in below:
go func() { worklist <- foundLinks }()
> Links found by crawl are sent to the worklist from a dedicated
> goroutine to avoid deadlock.
Why must this go exist?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论