2017年7月1日 07:46:37go评论96阅读模式

英文:

How to avoid double connection close upon regaining connectivity?

问题

在这种情况下：

客户端失去与zk的网络连接。
一分钟过去了。
客户端恢复与zk的网络连接。

我得到了以下的恐慌信息：

panic: close of closed channel
 
goroutine 2849 [running]:
github.com/samuel/go-zookeeper/zk.(*Conn).Close(0xc420795180)
  github.com/samuel/go-zookeeper/zk/conn.go:253 47
github.com/curator-go/curator.(*handleHolder).internalClose(0xc4203058f0, 0xc420302470, 0x0)
  github.com/curator-go/curator/state.go:136 +0x8d
github.com/curator-go/curator.(*handleHolder).closeAndReset(0xc4203058f0, 0xc42587cd00, 0x1e)
  github.com/curator-go/curator/state.go:122 +0x2f
github.com/curator-go/curator.(*connectionState).reset(0xc420302420, 0x1b71d87, 0xf)
  github.com/curator-go/curator/state.go:234 +0x55
github.com/curator-go/curator.(*connectionState).handleExpiredSession(0xc420302420)
  github.com/curator-go/curator/state.go:351 +0xd9
github.com/curator-go/curator.(*connectionState).checkState(0xc420302420, 0xffffff90, 0x0, 0x0, 0xc425ed2600, 0xed0e5250a)
  github.com/curator-go/curator/state.go:318 +0x9c
github.com/curator-go/curator.(*connectionState).process(0xc420302420, 0xc425ed2680)
  github.com/curator-go/curator/state.go:299 +0x16d
created by github.com/curator-go/curator.(*Watchers).Fire
  github.com/curator-go/curator/watcher.go:64 +0x96

以下是事件的详细顺序：

客户端失去与zk的网络连接。
一分钟过去了。
客户端恢复与zk的网络连接。
goroutine A 调用 s.ReregisterAll() -> Conn() -> checkTimeout() -> reset（因为已经过了1分钟） -> closeAndReset() -> conn.Close() 可能会阻塞一秒钟
goroutine B 处理 zk.StateExpired（zk集群发送此消息，因为它认为该客户端已经死亡，因为在第2步中没有进行ping操作） -> reset -> closeAndReset() -> conn.Close()，这会导致恐慌，因为 conn.Close() 已经关闭了连接的 c.shouldQuit 通道，并且 goroutine A 从未调用过 s.zooKeeper.getZookeeperConnection，因为它在阻塞一秒钟，所以没有新的连接。

我尝试了一个简单的解决方案，只是在 reset 上使用互斥锁，但现在我得到的 helper.GetConnectionString() 是空字符串。在客户端失去然后恢复网络连接时，避免此崩溃并尝试进入良好状态的最佳方法是什么？修复应该在 github.com/samuel/go-zookeeper 的实现中，不允许关闭已经关闭的连接吗？

（我在这里提交了这个问题 here，但是这个项目在讨论方面似乎有所欠缺，所以我在这里提问。）

英文:

Under these circumstances:

The client loses network connectivity to zk.
One minute passes.
The client regains network connectivity to zk.

I'm getting the following panic:

panic: close of closed channel
 
goroutine 2849 [running]:
github.com/samuel/go-zookeeper/zk.(*Conn).Close(0xc420795180)
  github.com/samuel/go-zookeeper/zk/conn.go:253 47
github.com/curator-go/curator.(*handleHolder).internalClose(0xc4203058f0, 0xc420302470, 0x0)
  github.com/curator-go/curator/state.go:136 +0x8d
github.com/curator-go/curator.(*handleHolder).closeAndReset(0xc4203058f0, 0xc42587cd00, 0x1e)
  github.com/curator-go/curator/state.go:122 +0x2f
github.com/curator-go/curator.(*connectionState).reset(0xc420302420, 0x1b71d87, 0xf)
  github.com/curator-go/curator/state.go:234 +0x55
github.com/curator-go/curator.(*connectionState).handleExpiredSession(0xc420302420)
  github.com/curator-go/curator/state.go:351 +0xd9
github.com/curator-go/curator.(*connectionState).checkState(0xc420302420, 0xffffff90, 0x0, 0x0, 0xc425ed2600, 0xed0e5250a)
  github.com/curator-go/curator/state.go:318 +0x9c
github.com/curator-go/curator.(*connectionState).process(0xc420302420, 0xc425ed2680)
  github.com/curator-go/curator/state.go:299 +0x16d
created by github.com/curator-go/curator.(*Watchers).Fire
  github.com/curator-go/curator/watcher.go:64 +0x96

This is the detailed sequence of events:

The client loses network connectivity to zk.
One minute passes.
The client regains network connectivity to zk.
goroutine A calls s.ReregisterAll() -> Conn() -> checkTimeout() -> reset (bc 1 minute has elapsed) -> closeAndReset() -> conn.Close() which can block for a second
goroutine B handles zk.StateExpired (zk cluster sends this bc it considers this client as dead since it didn't ping during 2.) -> reset -> closeAndReset() -> conn.Close() which causes a panic because conn.Close() already closed the connection's c.shouldQuit channel AND s.zooKeeper.getZookeeperConnection was never called by goroutine A because it was blocking for the second so there's no new connection.

A naive solution I tried is to just use a mutex on reset, but now I'm getting helper.GetConnectionString() equal to empty string. What's the best way to avoid this crash and try to get into a good state when the client loses and then regains network connectivity? Should the fix be in github.com/samuel/go-zookeeper's implementation of not letting you close an already closed connection?

(I've filed this issue here, but the project seems to be lacking in terms of discussion so I'm asking on SO.)

答案1

得分: 1

zk.Conn有一个返回枚举"State"的State()方法，其中State是以下之一：

type State int32
const (
    StateUnknown           State = -1
    StateDisconnected      State = 0
    StateConnecting        State = 1
    StateAuthFailed        State = 4
    StateConnectedReadOnly State = 5
    StateSaslAuthenticated State = 6
    StateExpired           State = -112

    StateConnected  = State(100)
    StateHasSession = State(101)
)

当goroutine B调用conn.Close()时，"conn"处于什么状态？

一个可能的解决方案是在goroutine B中添加一个switch语句，如果conn.StateConnecting，则不调用conn.Close()。

英文:

zk.Conn has a State() method that returns an enum "State", which is one of the following:

type State int32
const (
    StateUnknown           State = -1
    StateDisconnected      State = 0
    StateConnecting        State = 1
    StateAuthFailed        State = 4
    StateConnectedReadOnly State = 5
    StateSaslAuthenticated State = 6
    StateExpired           State = -112

    StateConnected  = State(100)
    StateHasSession = State(101)
)

What state is "conn" in when goroutine B calls conn.Close()?

A possible solution would be to add a switch to goroutine B whereby you do not call conn.Close() if you are in conn.StateConnecting.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何避免在恢复连接时出现双重连接关闭？

问题

答案1

如何在Go语言中通过函数向结构体添加标签

大猩猩会话在客户端的CORS上不起作用。

Kubernetes准入请求超时后，延迟函数是否仍会执行？

如何在Channels中实现类似于BlockingCollection.TakeFromAny的功能？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论