优雅关闭负面等待组

huangapple go评论97阅读模式
英文:

Go graceful shutdown negative WaitGroup

问题

我尝试实现了一个优雅关闭Go服务器的方法,如此博客文章所述:http://grisha.org/blog/2014/06/03/graceful-restart-in-golang/。主要部分如下所示。

自定义监听器:

var httpWg sync.WaitGroup  // 在其他部分初始化

type gracefulListener struct {
    net.Listener
    stop    chan error
    stopped bool
}

func newGracefulListener(l net.Listener) (gl *gracefulListener) {
    gl = &gracefulListener{Listener: l, stop: make(chan error)}
    go func() {
        _ = <-gl.stop
        gl.stopped = true
        gl.stop <- gl.Listener.Close()
    }()
    return
}

func (gl *gracefulListener) Accept() (c net.Conn, err error) {
    c, err = gl.Listener.Accept()
    if err != nil {
        return
    }

    c = gracefulConn{Conn: c}  // 使用自定义连接进行包装

    httpWg.Add(1)  // 增加计数器
    return
}

func (gl *gracefulListener) Close() error {
    if gl.stopped {
        return syscall.EINVAL
    }
    gl.stop <- nil
    return <-gl.stop
}

func (gl *gracefulListener) File() *os.File {
    tl := gl.Listener.(*net.TCPListener)
    fl, _ := tl.File()
    return fl
}

自定义连接:

type gracefulConn struct {
    net.Conn
}

func (w gracefulConn) Close() error {
    httpWg.Done()  // &lt;- 有时会引发恐慌
    return w.Conn.Close()
}

这个思路是当程序接收到SIGTERM信号时,停止接受新连接,并等待httpWg.Wait()等待现有连接完成。这种方法在本地工作正常,但在部署时,有时会在gracefulConn.Close()httpWg.Done()行引发恐慌:

panic: sync: negative WaitGroup counter

恐慌发生在我停止服务器之前,只是在常规服务期间发生。怎么可能会有比Accept()调用更多的Close()调用?或者我漏掉了什么?

附注:我尝试给gracefulConn添加了stopped属性和互斥锁,以便在Close中锁定互斥锁并检查stopped以确保只停止一次。然而,我仍然收到相同的恐慌。

英文:

I have tried to implement a graceful shutdown of the go server, as described in this blog post <http://grisha.org/blog/2014/06/03/graceful-restart-in-golang/>. The main bits are the following.

Custom listener:

var httpWg sync.WaitGroup  // initialised in the other part

type gracefulListener struct {
	net.Listener
	stop    chan error
	stopped bool
}

func newGracefulListener(l net.Listener) (gl *gracefulListener) {
	gl = &amp;gracefulListener{Listener: l, stop: make(chan error)}
	go func() {
		_ = &lt;-gl.stop
		gl.stopped = true
		gl.stop &lt;- gl.Listener.Close()
	}()
	return
}

func (gl *gracefulListener) Accept() (c net.Conn, err error) {
	c, err = gl.Listener.Accept()
	if err != nil {
		return
	}

	c = gracefulConn{Conn: c}  // wrap using our custom connection

	httpWg.Add(1)  // increase the counter
	return
}

func (gl *gracefulListener) Close() error {
	if gl.stopped {
		return syscall.EINVAL
	}
	gl.stop &lt;- nil
	return &lt;-gl.stop
}

func (gl *gracefulListener) File() *os.File {
	tl := gl.Listener.(*net.TCPListener)
	fl, _ := tl.File()
	return fl
}

Custom Conn:

type gracefulConn struct {
	net.Conn
}

func (w gracefulConn) Close() error {
	httpWg.Done()  // &lt;- panics sometimes
	return w.Conn.Close()
}

The idea is when the program receives SIGTERM, it stops serving new connections and just waits for the httpWg.Wait() for existing connections to finish.
This approach works locally, but when I deploy it, sometimes I receive a panic in the gracefulConn.Close() at httpWg.Done() line:

panic: sync: negative WaitGroup counter

The panic happens not when I stop the server but just during routine serving.
How is it possible, that there are more Close() calls then Accept() calls? Or am I missing something?

P.S. I have tried to add stopped property and a mutex to gracefullConn, so in Close it locks the mutex and checks stopped to ensure we stop it only once. However, I still received the same panic.

答案1

得分: 2

Close()可以被多次调用,所以在func (w gracefulConn) Close() error中一定要检查这一点。

> P.S. 我尝试给gracefulConn添加了一个stopped属性和一个互斥锁,所以在Close方法中,它会锁定互斥锁并检查stopped以确保我们只停止一次。然而,我仍然收到相同的恐慌错误。

请记住,如果将gracefulConn作为值传递而不是引用传递,那么任何互斥锁/标志都不会按预期工作。所以确保将c = gracefulConn{Conn: c}改为c = &amp;gracefulConn{Conn: c}

英文:

Close() can be called multiple times so you definitely need to check for that in func (w gracefulConn) Close() error.

> P.S. I have tried to add stopped property and a mutex to gracefullConn, so in Close it locks the mutex and checks stopped to ensure we stop it only once. However, I still received the same panic.

Keep in mind that gracefulConn if passed as a value not a reference so any mutexes/flags will not work as expected. So be sure to turn c = gracefulConn{Conn: c} into c = &amp;gracefulConn{Conn: c}.

答案2

得分: 1

我认为这是错误的方法。

你并不真正关心连接是否打开 - 持久连接怎么办?

你关心的是你是否在主动使用这些连接。我认为最好将你的WaitGroup放入自定义的ServeMux中的ServeHTTP方法中。然后你可以在一个函数调用周围使用WaitGroup

英文:

I think this is the wrong approach.

You don't really care about whether the connection is open or not - what about persistent connections?

What you care about is whether you are actively using those connections. You'd be better off putting your WaitGroup into a custom ServeMux into the ServeHTTP method I think. You can then have the WaitGroup around a function call.

huangapple
  • 本文由 发表于 2015年10月29日 20:04:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/33413747.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定