英文:
fasthttp websocket: panic nil: websocket.(*Conn).beginMessage()
问题
使用 Fiber v2.38.1,它使用 fasthttp/websocket v1.5.0。
应用程序崩溃并显示以下错误:
2022-10-15T04:05:42.983563+00:00 app[web.1]: time="2022-10-15T04:05:42Z" level=info msg="close msg received: &{418 close}"
2022-10-15T04:05:42.983564+00:00 app[web.1]: conn is nil: false, msg: &{418 close <nil> conn 0}
2022-10-15T04:05:42.986035+00:00 app[web.1]: panic: runtime error: invalid memory address or nil pointer dereference
2022-10-15T04:05:42.986035+00:00 app[web.1]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x78 pc=0x8db967]
2022-10-15T04:05:42.986036+00:00 app[web.1]:
2022-10-15T04:05:42.986037+00:00 app[web.1]: goroutine 86 [running]:
2022-10-15T04:05:42.986037+00:00 app[web.1]: github.com/fasthttp/websocket.(*Conn).beginMessage(0x0, 0xc000348f60, 0x1)
2022-10-15T04:05:42.986037+00:00 app[web.1]: /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/fasthttp/websocket@v1.5.0/conn.go:479 +0x27
2022-10-15T04:05:42.986038+00:00 app[web.1]: github.com/fasthttp/websocket.(*Conn).NextWriter(0x0, 0x1)
2022-10-15T04:05:42.986038+00:00 app[web.1]: /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/fasthttp/websocket@v1.5.0/conn.go:520 +0x45
2022-10-15T04:05:42.986039+00:00 app[web.1]: github.com/fasthttp/websocket.(*Conn).WriteJSON(0xe79a80?, {0xc143a0, 0xc00041e600})
2022-10-15T04:05:42.986039+00:00 app[web.1]: /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/fasthttp/websocket@v1.5.0/json.go:24 +0x45
2022-10-15T04:05:42.986039+00:00 app[web.1]: github.com/kuchaguangjie/go-fit/model.(*WsClient).WriteMsg(0xc00056a280, 0xc00041e600)
2022-10-15T04:05:42.986040+00:00 app[web.1]: /tmp/build_0b2fe533/model/ws_model.go:97 +0xf9
2022-10-15T04:05:42.986040+00:00 app[web.1]: github.com/kuchaguangjie/go-fit/controller.(*ContentController).Ws.func1.2()
2022-10-15T04:05:42.986041+00:00 app[web.1]: /tmp/build_0b2fe533/controller/content_ctl.go:623 +0x171
2022-10-15T04:05:42.986041+00:00 app[web.1]: created by github.com/kuchaguangjie/go-fit/controller.(*ContentController).Ws.func1
2022-10-15T04:05:42.986042+00:00 app[web.1]: /tmp/build_0b2fe533/controller/content_ctl.go:608 +0x10cc
2022-10-15T04:05:43.113930+00:00 heroku[web.1]: Process exited with status 2
它在 conn.go:479
处崩溃,具体是在 beginMessage()
函数中:
> if c.writer != nil {
我已经检查过 c
不是 nil(如日志所示),所以它为什么会崩溃...
顺便说一下,它部署在 Heroku 上,我在 Heroku 上看到以下日志:
> 2022-10-15T04:59:30.344791+00:00 heroku[router]: at=error code=H15 desc="Idle connection"
不确定是否相关。
更新:相关代码
- 处理关闭消息的单独 goroutine(
content_ctl.go
的一部分):
// 处理关闭消息 - 来自通道,
// TODO: 使用 goroutine 池?
go func() {
if r := recover(); r != nil { // TODO: 这样做合适吗?
fmt.Printf("recover from panic: %v\n", r)
}
closeMsg := <-(*client).CloseChan // 处理关闭消息
log.Infof("close msg received: %v", closeMsg)
// 发送关闭消息
closeResp := &model.WsResp{
Status: closeMsg.HttpCode,
Source: model.RespSourceConn,
Msg: closeMsg.Msg,
}
log.Infof("is conn nil: %v, msg: %v\n", client.WsConn == nil, closeResp)
err = client.WriteMsg(closeResp)
if err != nil {
log.Errorf("error - on write close msg: %v", closeResp)
}
ctl.cwm.CloseAndCleanup(client.Id) // 关闭并清理
}()
WsClient
结构体的定义,以及它的WriteMsg()
方法(ws_model.go
的一部分):
// WebSocket 客户端
type WsClient struct {
WsUser
ContentUuid string `json:"contentUuid"` // 内容 UUID
WsConn *websocket.Conn `json:"-"` // WebSocket 连接
CloseChan chan *CloseMsg `json:"-"` // 关闭通道
WriteLock *sync.Mutex // 写锁,避免并发写入 WebSocket 连接,写入或关闭时获取锁
}
// 带锁的写入 WebSocket 消息
func (wsc *WsClient) WriteMsg(msg *WsResp) error {
if wsc.WsConn == nil {
return errutil.ErrNoConn
}
wsc.WriteLock.Lock()
err := wsc.WsConn.WriteJSON(msg)
wsc.WriteLock.Unlock()
return err
}
CloseMsg
结构体(ws_model.go
的一部分):
// 发送到关闭通道的关闭消息
type CloseMsg struct {
HttpCode int `json:"httpCode"`
Msg string `json:"msg"`
}
问题状态:
- 实际上,在提问之前,我通过检查
CloseMsg
中的httpCode
是否为418
来解决了问题,如果是418
,则不会发送关闭消息。 - 尽管问题已经解决,但我仍然想知道问题的实际原因,所以我提出了这个问题。
一些背景信息:
- 对于
CloseMsg
的HttpCode
,我有几个可能的值。418
是默认值,只有在连接发生意外的 I/O 问题时才会发生(例如,Heroku 因空闲而终止连接)。- 对于其他情况,例如如果我想踢出一个连接,我会发送其他的 HTTP 状态码。
- 从日志中,我发现问题只发生在关闭消息的 HTTP 状态码为
418
时。 - 顺便说一下,虽然
418
可能不是最好的 HTTP 状态码(它是一个笑话),但我认为这不会影响问题。 - 在
WsClient
的WriteMsg()
方法中,我一直在发送消息之前检查连接是否为 nil。
更新 - 来自 Heroku 的更多日志
2022-10-16T05:19:03.088622+00:00 heroku[router]: at=error code=H15 desc="Idle connection" method=GET path="/api/v1/content/ws/RgPx8znal7AJb
2022-10-16T05:19:03.065493+00:00 app[web.1]: time="2022-10-16T05:19:03Z" level=info msg="error - on read: websocket: close 1006 (abnormal closure): unexpected EOF"
2022-10-16T05:20:57.758379+00:00 app[web.1]: time="2022-10-16T05:20:57Z" level=info msg="cleanup client, id = 'wscid-ee670cc5-4100-49d6-9857-8284d93a6d33'"
2022-10-16T05:20:57.758505+00:00 app[web.1]: time="2022-10-16T05:20:57Z" level=info msg="close msg received: &{418 close}"
看起来 Heroku 会在一段时间内没有消息时关闭 WebSocket 连接。这就是 418
出现的原因。
另外,我在 SO 上提了一个关于 Heroku 终止 WebSocket 连接的问题:https://stackoverflow.com/questions/74084808/heroku-kill-idle-connection-websocket
英文:
Using Fiber v2.38.1, which in turn use fasthttp/websocket v1.5.0.
App crashed with following error:
2022-10-15T04:05:42.983563+00:00 app[web.1]: time="2022-10-15T04:05:42Z" level=info msg="close msg received: &{418 close}"
2022-10-15T04:05:42.983564+00:00 app[web.1]: conn is nil: false, msg: &{418 close <nil> conn 0}
2022-10-15T04:05:42.986035+00:00 app[web.1]: panic: runtime error: invalid memory address or nil pointer dereference
2022-10-15T04:05:42.986035+00:00 app[web.1]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x78 pc=0x8db967]
2022-10-15T04:05:42.986036+00:00 app[web.1]:
2022-10-15T04:05:42.986037+00:00 app[web.1]: goroutine 86 [running]:
2022-10-15T04:05:42.986037+00:00 app[web.1]: github.com/fasthttp/websocket.(*Conn).beginMessage(0x0, 0xc000348f60, 0x1)
2022-10-15T04:05:42.986037+00:00 app[web.1]: /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/fasthttp/websocket@v1.5.0/conn.go:479 +0x27
2022-10-15T04:05:42.986038+00:00 app[web.1]: github.com/fasthttp/websocket.(*Conn).NextWriter(0x0, 0x1)
2022-10-15T04:05:42.986038+00:00 app[web.1]: /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/fasthttp/websocket@v1.5.0/conn.go:520 +0x45
2022-10-15T04:05:42.986039+00:00 app[web.1]: github.com/fasthttp/websocket.(*Conn).WriteJSON(0xe79a80?, {0xc143a0, 0xc00041e600})
2022-10-15T04:05:42.986039+00:00 app[web.1]: /tmp/codon/tmp/cache/go-path/pkg/mod/github.com/fasthttp/websocket@v1.5.0/json.go:24 +0x45
2022-10-15T04:05:42.986039+00:00 app[web.1]: github.com/kuchaguangjie/go-fit/model.(*WsClient).WriteMsg(0xc00056a280, 0xc00041e600)
2022-10-15T04:05:42.986040+00:00 app[web.1]: /tmp/build_0b2fe533/model/ws_model.go:97 +0xf9
2022-10-15T04:05:42.986040+00:00 app[web.1]: github.com/kuchaguangjie/go-fit/controller.(*ContentController).Ws.func1.2()
2022-10-15T04:05:42.986041+00:00 app[web.1]: /tmp/build_0b2fe533/controller/content_ctl.go:623 +0x171
2022-10-15T04:05:42.986041+00:00 app[web.1]: created by github.com/kuchaguangjie/go-fit/controller.(*ContentController).Ws.func1
2022-10-15T04:05:42.986042+00:00 app[web.1]: /tmp/build_0b2fe533/controller/content_ctl.go:608 +0x10cc
2022-10-15T04:05:43.113930+00:00 heroku[web.1]: Process exited with status 2
It crashed at conn.go:479
, in function beginMessage()
:
> if c.writer != nil {
I've checked that c
is not nil (as shown in log), so how it crashed ...
BTW, it's deployed on heroku, I saw following log on heroku:
> 2022-10-15T04:59:30.344791+00:00 heroku[router]: at=error code=H15 desc="Idle connection"
Not sure is it relevent.
Update: relevant code
- A separate goroutine that handle close msg (part of
content_ctl.go
):
// handle close - from a channel,
// TODO: use goroutine pool ?
go func() {
if r := recover(); r != nil { // TODO: is this proper ?,
fmt.Printf("recover from panic: %v\n", r)
}
closeMsg := <-(*client).CloseChan // handle close,
log.Infof("close msg received: %v", closeMsg)
// send close msg,
closeResp := &model.WsResp{
Status: closeMsg.HttpCode,
Source: model.RespSourceConn,
Msg: closeMsg.Msg,
}
log.Infof("is conn nil: %v, msg: %v\n", client.WsConn == nil, closeResp)
err = client.WriteMsg(closeResp)
if err != nil {
log.Errorf("error - on write close msg: %v", closeResp)
}
ctl.cwm.CloseAndCleanup(client.Id) // close & clean up,
}()
- Definition of
WsClient
struct, and its methodWriteMsg()
(part ofws_model.go
):
// websocket client,
type WsClient struct {
WsUser
ContentUuid string `json:"contentUuid"` // content uuid id,
WsConn *websocket.Conn `json:"-"` // websocket connection,
CloseChan chan *CloseMsg `json:"-"` // close channel,
WriteLock *sync.Mutex // write lock, to avoid concurrent write to ws connection, which will cause panic, acquire lock when write or close,
}
// write ws msg, with lock,
func (wsc *WsClient) WriteMsg(msg *WsResp) error {
if wsc.WsConn == nil {
return errutil.ErrNoConn
}
wsc.WriteLock.Lock()
err := wsc.WsConn.WriteJSON(msg)
wsc.WriteLock.Unlock()
return err
}
CloseMsg
struct (part ofws_model.go
):
// close msg sent to close chan,
type CloseMsg struct {
HttpCode int `json:"httpCode"`
Msg string `json:"msg"`
}
Problem status:
- Actually, before asking the question I've solved the problem by checking the
httpCode
in theCloseMsg
, if it's418
, then I won't send the close msg. - Though the issue is gone for now, but still I want to know the actual reason of the problem, so I asked the question.
Some background:
- I've several possible value for
CloseMsg
'sHttpCode
.418
is the default one, it happens only when unexpected io issue happen on the connection (e.g heroku terminate the connection due to idle ?).- For other cases, e.g if I want to kick out a connection, then I will send other http code.
- From the log, I found the problem happens only when the http code from close msg is
418
. - BTW, though
418
might not be the best http code to use, (it's from a joke), but that doesn't effect the problem I guess. - And, In
WsClient
'sWriteMsg()
method, I've always checked whether the conn nil, before send msg.
Update - more log from heroku
2022-10-16T05:19:03.088622+00:00 heroku[router]: at=error code=H15 desc="Idle connection" method=GET path="/api/v1/content/ws/RgPx8znal7AJb
2022-10-16T05:19:03.065493+00:00 app[web.1]: time="2022-10-16T05:19:03Z" level=info msg="error - on read: websocket: close 1006 (abnormal closure): unexpected EOF"
2022-10-16T05:20:57.758379+00:00 app[web.1]: time="2022-10-16T05:20:57Z" level=info msg="cleanup client, id = 'wscid-ee670cc5-4100-49d6-9857-8284d93a6d33'"
2022-10-16T05:20:57.758505+00:00 app[web.1]: time="2022-10-16T05:20:57Z" level=info msg="close msg received: &{418 close}"
Seems heroku will kill the websocket connection if there is no msg for a while. That's when 418
appear.
And, I've asked a question about heroku killing the websocket connection on SO: https://stackoverflow.com/questions/74084808/heroku-kill-idle-connection-websocket
答案1
得分: 1
这是一个fasthttp的bug,在v1.5.1-rc.8
版本中已修复。
参考链接:https://github.com/fasthttp/websocket/issues/30#issuecomment-1326860975
英文:
This is a fasthttp bug, and fixed in v1.5.1-rc.8
refer: https://github.com/fasthttp/websocket/issues/30#issuecomment-1326860975
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论