2017年5月18日 03:23:38go评论138阅读模式

英文:

GKE + WebSocket + NodePort 30s dropped connections

问题

我有一个使用gorilla实现的golang服务，它通过NodePort（在这种情况下为30002）暴露给Google容器引擎（GKE）/k8s集群的WebSocket客户端。

我有一个手动创建的负载均衡器（即非k8s入口/负载均衡器），具有HTTP/HTTPS前端（即80/443），将流量转发到我的GKE/k8s集群中的节点上的端口30002。

我可以让我的浏览器中的JavaScript WebSocket实现连接、升级和发送/接收消息。

我在golang WebSocket客户端中记录ping/pong，一切看起来都很好，直到30秒后。连接30秒后，我的golang WebSocket客户端收到EOF / close 1006（异常关闭），我的JavaScript代码收到一个关闭事件。据我所知，我的Golang和JavaScript代码都没有启动WebSocket关闭。

在这种情况下，我对会话亲和性不是特别关心，但我已经尝试了基于IP和基于cookie的负载均衡器，并使用了长时间的cookie。

此外，这组完全相同的k8s部署/ pod/服务规范和golang服务代码在我基于AWS的KOPS k8s集群上通过AWS的ELB运行得很好。

有任何想法，这些30秒的强制关闭可能来自哪里？这可能是特定于GKE的k8s默认集群设置还是GCE负载均衡器上的某些设置？

谢谢阅读！

更新 -

负载均衡器上有一个后端配置超时设置，用于“在将其视为失败请求之前等待后端服务响应的时间”。

WebSocket不会无响应。它会发送ping/pong和其他消息，直到被终止，我可以通过浏览器中的console.log和golang服务中的日志进行验证。

也就是说，如果我将负载均衡器后端超时设置增加到30000秒，事情就“正常工作”。

不过，这不像是一个真正的解决方法，因为负载均衡器将继续不适当地向实际无响应的服务提供流量，而不管WebSocket是否确实变得无响应。

我已经将高超时设置隔离到特定的后端设置中，使用路径映射，但希望能找到一个真正解决问题的方法。

英文:

I have a golang service that implements a WebSocket client using gorilla that is exposed to a Google Container Engine (GKE)/k8s cluster via a NodePort (30002 in this case).

I've got a manually created load balancer (i.e. NOT at k8s ingress/load balancer) with HTTP/HTTPS frontends (i.e. 80/443) that forward traffic to nodes in my GKE/k8s cluster on port 30002.

I can get my JavaScript WebSocket implementation in the browser (Chrome 58.0.3029.110 on OSX) to connect, upgrade and send / receive messages.

I log ping/pongs in the golang WebSocket client and all looks good until 30s in. 30s after connection my golang WebSocket client gets an EOF / close 1006 (abnormal closure) and my JavaScript code gets a close event. As far as I can tell, neither my Golang or JavaScript code is initiating the WebSocket closure.

I don't particularly care about session affinity in this case AFAIK, but I have tried both IP and cookie based affinity in the load balancer with long lived cookies.

Additionally, this exact same set of k8s deployment/pod/service specs and golang service code works great on my KOPS based k8s cluster on AWS through AWS' ELBs.

Any ideas where the 30s forced closures might be coming from? Could that be a k8s default cluster setting specific to GKE or something on the GCE load balancer?

Thanks for reading!

-- UPDATE --

There is a backend configuration timeout setting on the load balancer which is for "How long to wait for the backend service to respond before considering it a failed request".

The WebSocket is not unresponsive. It is sending ping/pong and other messages right up until getting killed which I can verify by console.log's in the browser and logs in the golang service.

That said, if I bump the load balancer backend timeout setting to 30000 seconds, things "work".

Doesn't feel like a real fix though because the load balancer will continue to feed actual unresponsive services traffic inappropriately, never mind if the WebSocket does become unresponsive.

I've isolated the high timeout setting to a specific backend setting using a path map, but hoping to come up with a real fix to the problem.

答案1

得分: 3

我认为这可能是按预期工作的。谷歌刚刚在今天更新了文档（大约一个小时前）。

LB代理支持文档

后端服务组件文档

祝好，

Matt

英文:

I think this may be Working as Intended. Google just updated the documentation today (about an hour ago).

LB Proxy Support docs

Backend Service Components docs

Cheers,

Matt

答案2

得分: -2

请查看以下示例：https://github.com/kubernetes/ingress-gce/tree/master/examples/websocket

英文:

Check out the following example: https://github.com/kubernetes/ingress-gce/tree/master/examples/websocket

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

GKE + WebSocket + NodePort 30s dropped connections

问题

答案1

答案2

Golang：将结构体类型设置为nil。

无法在映射中为结构字段赋值。

在使用cgo调用和运行Python3.7的numpy矩阵点乘操作时，无法提取结果。

如何在使用golang的goroutine时实现计数器？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。