英文:
Is blocking on a channel send a bad synchronization paradigm and why
问题
Effective Go在如何使用通道模拟信号量的示例中给出了以下代码:
var sem = make(chan int, MaxOutstanding)
func handle(r *Request) {
<-sem
process(r)
sem <- 1
}
func init() {
for i := 0; i < MaxOutstanding; i++ {
sem <- 1
}
}
func Serve(queue chan *Request) {
for {
req := <-queue
go handle(req)
}
}
它还说:因为数据同步发生在从通道接收时(也就是发送“发生在”接收之前;参见Go内存模型),所以获取信号量必须在通道接收上进行,而不是发送上。
现在,我认为我理解了Go内存模型和“发生在”的定义。但是我不明白在通道发送上阻塞有什么问题:
func handle(r *Request) {
sem <- 1
process(r)
<-sem
}
func init() {}
这段代码(其中sem
和Serve
与上面的代码相同)以相反的方式使用了缓冲通道。通道开始为空。在进入handle
时,如果已经有MaxOutstanding
个goroutine在处理过程中,发送操作将会阻塞。一旦其中一个goroutine完成其处理并通过接收一个int“释放”通道中的一个槽位,我们的发送操作将被解除阻塞,goroutine将开始自己的处理过程。
为什么这种同步方式是不好的,正如教科书所暗示的那样?
释放通道槽位的接收操作是否不会在使用相同槽位的发送操作之前“发生在”?这怎么可能?
换句话说,语言参考中说*"在缓冲通道上的发送操作[会阻塞直到]缓冲区中有空间。"*
但是内存模型只说*"从非缓冲通道接收会在该通道上的发送完成之前发生。"* 特别地,它没有说从已满的缓冲通道接收会在该通道上的发送完成之前发生。
这是一种不能信任的边界情况吗?(实际上,这将会同步一个被阻塞的发送操作和解除其阻塞的接收操作)
如果是这样的话,这看起来像是一种在设计上旨在最小化隐蔽竞态条件的语言中的恶劣竞态条件
var c = make(chan int, 1)
var a string
func f() {
a = "hello, world"
<-c // 解除main的阻塞,希望它能看到更新后的'a'
}
func main() {
c <- 0 // 填满缓冲通道
go f()
c <- 0 // 这里会阻塞,因为通道已满
print(a)
}
英文:
Effective Go gives this example on how to emulate a semaphore with channels:
var sem = make(chan int, MaxOutstanding)
func handle(r *Request) {
<-sem
process(r)
sem <- 1
}
func init() {
for i := 0; i < MaxOutstanding; i++ {
sem <- 1
}
}
func Serve(queue chan *Request) {
for {
req := <-queue
go handle(req)
}
}
It also says: Because data synchronization occurs on a receive from a channel (that is, the send "happens before" the receive; see The Go Memory Model), acquisition of the semaphore must be on a channel receive, not a send.
Now, I think I understand the Go Memory Model and the definition of "happens before." But I fail to see what's the problem with blocking on a channel send:
func handle(r *Request) {
sem <- 1
process(r)
<-sem
}
func init() {}
This code (with sem
and Serve
unchanged from above) uses the buffered channel in the opposite way. The channel starts empty. On entering handle
, the send will block if there are already MaxOutstanding
goroutines doing the process. As soon as one of them finishes its processing and "frees" a slot from the channel, by receiving one int, our send will be unblocked and the goroutine will start its own processing.
Why is this a bad way to do synchronization, as the textbook seems to imply?
Does a receive operation that frees a channel slot not "happen before" the send that will use that same slot? How is this possible?
In other words, the Language Reference says that "a send on a buffered channel [blocks until] there is room in the buffer."
But the Memory Model only says that "A receive from an unbuffered channel happens before the send on that channel completes." In particular, it does not say that a receive from a buffered channel that is full happens before a send on that channel completes.
Is this some corner case that can not be trusted to do the Right Thing? (which would be actually synchronizing a send that was blocked with the receive that unblocks it)
If that's the case, it looks like a nasty race condition in a language designed to minimize sneaky race conditions
var c = make(chan int, 1)
var a string
func f() {
a = "hello, world"
<-c // unblock main, which will hopefully see the updated 'a'
}
func main() {
c <- 0 // fill up the buffered channel
go f()
c <- 0 // this blocks because the channel is full
print(a)
}
答案1
得分: 5
这部分的Effective Go文档也让我感到困惑。实际上,在相对较新的Effective Go版本中,与当前版本不同,代码在通道接收时获取了信号量(而不是在通道发送时获取,当前版本使用init()来“初始化”通道)。
显然,关于这个问题已经进行了很多讨论。我不打算试图总结一切,但所有讨论都可以从这里找到:
https://code.google.com/p/go/issues/detail?id=5023
这让我感到不幸,但引用该问题的提交者的话,简而言之,除非在通道接收时获取信号量...:
以下代码:
func handle(r *Request) {
sem <- 1 // 等待活动队列排空。
process(r) // 可能需要很长时间。
<-sem // 完成;启用下一个请求运行。
}
...可以合法地“优化”为:
func handle(r *Request) {
process(r) // 可能需要很长时间。
sem <- 1 // 等待活动队列排空。
<-sem // 完成;启用下一个请求运行。
}
...或者为:
func handle(r *Request) {
sem <- 1 // 等待活动队列排空。
<-sem // 完成;启用下一个请求运行。
process(r) // 可能需要很长时间。
}
英文:
This bit of the Effective Go document threw me also. In fact, in relatively recent versions of Effective Go, the code in question acquired the semaphore on a channel send (instead of a channel receive like it does in the current version, which uses the init() to "prime" the channel).
There has apparently been a good deal of discussion on the topic. I won't bother trying to summarize everything, but the discussion can all be found from here:
https://code.google.com/p/go/issues/detail?id=5023
It does strike me as unfortunate, but quoting the filer of that issue, the short story appears to be that unless the semaphore is acquired on the channel receive...:
The following code:
func handle(r *Request) {
sem <- 1 // Wait for active queue to drain.
process(r) // May take a long time.
<-sem // Done; enable next request to run.
}
...could legally be "optimized" into:
func handle(r *Request) {
process(r) // May take a long time.
sem <- 1 // Wait for active queue to drain.
<-sem // Done; enable next request to run.
}
...or into:
func handle(r *Request) {
sem <- 1 // Wait for active queue to drain.
<-sem // Done; enable next request to run.
process(r) // May take a long time.
}
答案2
得分: 1
如果我理解正确(很可能我没有理解正确),问题只是语言没有正确的保证某些事情发生的顺序,以便以这种方式使用。
当我遇到类似的情况时,我通常会弄清楚(有时在尴尬的尝试和错误之后),并不是语言“缺少某些东西”,而是我试图用一把锤子来画画。
在你提到的具体例子中,我会通过稍微改变结构来解决它:
不要在发送者中使用信号量(并在接收者中解除阻塞),只需提前生成所需数量的goroutine,然后通过通道发送它们的工作。不需要信号量。我理解这只是一个简化的例子,但如果你更详细地描述你的实际用例/问题,可能会有人提供一个干净的Go语言解决方案。
英文:
If I understand it right (which it's likely that I don't) the problem is just that the language doesn't have the right guarantees for which order some of these things will happen for it to be used that way.
When I have run into something like this I've usually figured out (sometimes after embarrassingly much trial and error) that it wasn't that the language was "missing something" but that I was trying to paint with a hammer.
In the specific example you have on top I'd solve it by structuring it a little differently:
Instead of having the semaphore in the sender (and unblock in the receiver) just spawn the desired number of goroutines up front and then send them work over a channel. No semaphores needed. I understand this was just a condensed example, but if you describe your actual use case/issues in more detail it's likely someone will chime in with a clean go-like solution for it.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论