英文:
Go graceful shutdown negative WaitGroup
问题
我尝试实现了一个优雅关闭Go服务器的方法,如此博客文章所述:http://grisha.org/blog/2014/06/03/graceful-restart-in-golang/。主要部分如下所示。
自定义监听器:
var httpWg sync.WaitGroup // 在其他部分初始化
type gracefulListener struct {
net.Listener
stop chan error
stopped bool
}
func newGracefulListener(l net.Listener) (gl *gracefulListener) {
gl = &gracefulListener{Listener: l, stop: make(chan error)}
go func() {
_ = <-gl.stop
gl.stopped = true
gl.stop <- gl.Listener.Close()
}()
return
}
func (gl *gracefulListener) Accept() (c net.Conn, err error) {
c, err = gl.Listener.Accept()
if err != nil {
return
}
c = gracefulConn{Conn: c} // 使用自定义连接进行包装
httpWg.Add(1) // 增加计数器
return
}
func (gl *gracefulListener) Close() error {
if gl.stopped {
return syscall.EINVAL
}
gl.stop <- nil
return <-gl.stop
}
func (gl *gracefulListener) File() *os.File {
tl := gl.Listener.(*net.TCPListener)
fl, _ := tl.File()
return fl
}
自定义连接:
type gracefulConn struct {
net.Conn
}
func (w gracefulConn) Close() error {
httpWg.Done() // <- 有时会引发恐慌
return w.Conn.Close()
}
这个思路是当程序接收到SIGTERM信号时,停止接受新连接,并等待httpWg.Wait()
等待现有连接完成。这种方法在本地工作正常,但在部署时,有时会在gracefulConn.Close()
的httpWg.Done()
行引发恐慌:
panic: sync: negative WaitGroup counter
恐慌发生在我停止服务器之前,只是在常规服务期间发生。怎么可能会有比Accept()
调用更多的Close()
调用?或者我漏掉了什么?
附注:我尝试给gracefulConn
添加了stopped
属性和互斥锁,以便在Close
中锁定互斥锁并检查stopped
以确保只停止一次。然而,我仍然收到相同的恐慌。
英文:
I have tried to implement a graceful shutdown of the go server, as described in this blog post <http://grisha.org/blog/2014/06/03/graceful-restart-in-golang/>. The main bits are the following.
Custom listener:
var httpWg sync.WaitGroup // initialised in the other part
type gracefulListener struct {
net.Listener
stop chan error
stopped bool
}
func newGracefulListener(l net.Listener) (gl *gracefulListener) {
gl = &gracefulListener{Listener: l, stop: make(chan error)}
go func() {
_ = <-gl.stop
gl.stopped = true
gl.stop <- gl.Listener.Close()
}()
return
}
func (gl *gracefulListener) Accept() (c net.Conn, err error) {
c, err = gl.Listener.Accept()
if err != nil {
return
}
c = gracefulConn{Conn: c} // wrap using our custom connection
httpWg.Add(1) // increase the counter
return
}
func (gl *gracefulListener) Close() error {
if gl.stopped {
return syscall.EINVAL
}
gl.stop <- nil
return <-gl.stop
}
func (gl *gracefulListener) File() *os.File {
tl := gl.Listener.(*net.TCPListener)
fl, _ := tl.File()
return fl
}
Custom Conn:
type gracefulConn struct {
net.Conn
}
func (w gracefulConn) Close() error {
httpWg.Done() // <- panics sometimes
return w.Conn.Close()
}
The idea is when the program receives SIGTERM, it stops serving new connections and just waits for the httpWg.Wait()
for existing connections to finish.
This approach works locally, but when I deploy it, sometimes I receive a panic in the gracefulConn.Close()
at httpWg.Done()
line:
panic: sync: negative WaitGroup counter
The panic happens not when I stop the server but just during routine serving.
How is it possible, that there are more Close()
calls then Accept()
calls? Or am I missing something?
P.S. I have tried to add stopped
property and a mutex to gracefullConn
, so in Close
it locks the mutex and checks stopped
to ensure we stop it only once. However, I still received the same panic.
答案1
得分: 2
Close()
可以被多次调用,所以在func (w gracefulConn) Close() error
中一定要检查这一点。
> P.S. 我尝试给gracefulConn
添加了一个stopped
属性和一个互斥锁,所以在Close
方法中,它会锁定互斥锁并检查stopped
以确保我们只停止一次。然而,我仍然收到相同的恐慌错误。
请记住,如果将gracefulConn
作为值传递而不是引用传递,那么任何互斥锁/标志都不会按预期工作。所以确保将c = gracefulConn{Conn: c}
改为c = &gracefulConn{Conn: c}
。
英文:
Close()
can be called multiple times so you definitely need to check for that in func (w gracefulConn) Close() error
.
> P.S. I have tried to add stopped property and a mutex to gracefullConn, so in Close it locks the mutex and checks stopped to ensure we stop it only once. However, I still received the same panic.
Keep in mind that gracefulConn
if passed as a value not a reference so any mutexes/flags will not work as expected. So be sure to turn c = gracefulConn{Conn: c}
into c = &gracefulConn{Conn: c}
.
答案2
得分: 1
我认为这是错误的方法。
你并不真正关心连接是否打开 - 持久连接怎么办?
你关心的是你是否在主动使用这些连接。我认为最好将你的WaitGroup
放入自定义的ServeMux
中的ServeHTTP
方法中。然后你可以在一个函数调用周围使用WaitGroup
。
英文:
I think this is the wrong approach.
You don't really care about whether the connection is open or not - what about persistent connections?
What you care about is whether you are actively using those connections. You'd be better off putting your WaitGroup
into a custom ServeMux
into the ServeHTTP
method I think. You can then have the WaitGroup
around a function call.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论