英文:
AWS/ALB, http/2, and GOAWAY
问题
最近我们从ELB切换到ELB2/ALB,偶尔我们的Go HTTP/2客户端会收到来自应用负载均衡器的GOAWAY消息,我无法解释这个问题。目标组服务器仅支持HTTP/1.1,我们的负载均衡器应该始终至少有一个健康的服务器在轮询中。
我可以可靠地重现在ALB中注册新实例时出现GOAWAY的情况。当目标处于“Initial”状态时,ALB返回GOAWAY。此外,即使ALB返回GOAWAY,请求仍然成功传递到目标组中的其他实例。因此,假设有实例web0和web1,如果我注销web0并重新注册该目标,那么在web0处于“Initial”状态时,我可以可靠地重现GOAWAY。然而,我们的日志显示web1成功处理了该请求。
我们的客户端是一个使用http.DefaultClient的Go程序。我可以使用Go 1.7和1.8beta2来重现这种行为。
当出现这种情况时,我们的客户端会记录有关HTTP/2响应的更多详细信息:
http2: server sent GOAWAY and closed the connection; LastStreamID=1, ErrCode=NO_ERROR, debug=""
我想更好地了解这里发生了什么。Go的http2包或我们的代码是否应该自动处理GOAWAY并重试请求?我对HTTP/2不够熟悉,不知道GOAWAY是否是预期的,这意味着我们的Go客户端不应将其视为错误条件,还是这表明在ALB上出现了问题。
英文:
We recently switched from ELB to ELB2/ALBs and occasionally our go http/2 clients are seeing GOAWAY messages from our Application Load Balancers which I'm unable to explain. The target group servers only support http/1.1 and our load balancers should always have at least one healthy server in rotation.
I can reliably reproduce the GOAWAY when registering a new instance in the ALB. The ALB returns GOAWAY when the target is in the "Initial" state. Further, even though the ALB responds with GOAWAY the request successfully makes it to the other instance registered in the target group. So, given instances web0 and web1, if I deregister web0 and re-register that target I can reliably reproduce the GOAWAY if I make a request while web0 is in "Initial". However our logs show that web1 successfully handled the request.
Our client is a Go program using http.DefaultClient. I can reproduce this behavior using both Go 1.7 and 1.8beta2.
When this occurs our client logs more details about the HTTP/2 response:
http2: server sent GOAWAY and closed the connection; LastStreamID=1, ErrCode=NO_ERROR, debug=""
I'd like to better understand what's going on here. Should either the go http2 package or our code automatically handle the GOAWAY by retrying the request? I'm not familiar enough with http2 to know if GOAWAY is expected, which implies that our Go client shouldn't handle it as an error condition, or if this indicates that something is going wrong at the ALB.
答案1
得分: 5
关于GOAWAY
GOAWAY
帧携带了三个信息,可以帮助您排除问题:
+-+-------------------------------------------------------------+
|R| Last-Stream-ID (31) |
+-+-------------------------------------------------------------+
| Error Code (32) |
+---------------------------------------------------------------+
| Additional Debug Data (*) |
+---------------------------------------------------------------+
- Last-Stream-ID是最后一个正确处理的流的ID。这可能有助于理解发生了什么情况:RFC中提供了一些建议,以实现优雅的关闭:首先发送一个带有
Last-Stream-ID
为NO_ERROR
的GOAWAY
帧,让客户端知道即将关闭连接,然后经过一段时间,发送另一个带有Last-Stream-ID
设置为实际上最后处理的ID的GOAWAY
帧。这样客户端就知道已经传递了什么。以下是相关摘录,来自RFC7540,6.8 GOAWAY
> 试图优雅地关闭连接的服务器应该发送一个初始的GOAWAY帧,其中最后一个流标识符设置为2^31-1,以及一个NO_ERROR代码。这向客户端发出信号,表示关闭即将发生,并且禁止发起进一步的请求。在允许任何正在进行的流创建的时间(至少一个往返时间)之后,服务器可以发送另一个带有更新的最后流标识符的GOAWAY帧。这确保可以在不丢失请求的情况下干净地关闭连接。
- 错误代码和附加的调试数据(一个字符串)将包含进一步解释发生了什么的附加信息。RFC 7540,7. 错误代码列出了可能的错误代码。然后,根据服务器的实现,您可能会得到一个缩小错误范围的字符串。例如,在H2O中,服务器发送“在标头名称中找到大写字母”时,表示在标头名称中找到了大写字母。
这个特定的GOAWAY
http2: server sent GOAWAY and closed the connection; LastStreamID=1, ErrCode=NO_ERROR, debug=""
由于服务器发送的是NO_ERROR
,您的客户端应该尝试重新连接,而不将该消息视为错误。
至于为什么ALB发送GOAWAY...我不确定,您能否提供更多详细信息?
英文:
About GOAWAY
The GOAWAY
frame carries three pieces of information that could help you troubleshooting the issue:
+-+-------------------------------------------------------------+
|R| Last-Stream-ID (31) |
+-+-------------------------------------------------------------+
| Error Code (32) |
+---------------------------------------------------------------+
| Additional Debug Data (*) |
+---------------------------------------------------------------+
- Last-stream-ID is the last ID that was correctly processed. This might help understand what's going on: the RFC has some advice on how to achieve a graceful shutdown: First send a
GOAWAY
frame withLast-Stream-ID
withNO_ERROR
, to let the client know that a shutdown is about to come, then after some time, send anotherGOAWAY
frame withLast-Stream-ID
set to the actually last processed ID. So that the client knows what's been passed on. Here's the relevant extract, from RFC7540, 6.8 GOAWAY
> A server that is attempting to gracefully shut down a connection
> SHOULD send an initial GOAWAY frame with the last stream identifier
> set to 2^31-1 and a NO_ERROR code. This signals to the client that
> a shutdown is imminent and that initiating further requests is
> prohibited. After allowing time for any in-flight stream creation
> (at least one round-trip time), the server can send another GOAWAY
> frame with an updated last stream identifier. This ensures that a
> connection can be cleanly shut down without losing requests.
- The error code and the additional debug data (a string), will contain additional information that explain what's going on. RFC 7540, 7. Error Codes has the list of possible error codes. Then depending on the server implementation you might have a string narrowing down the error. For example in H2O, the server sends
found an upper-case letter in header name
when an upper case letter was found in a header name.
This particular GOAWAY
http2: server sent GOAWAY and closed the connection; LastStreamID=1, ErrCode=NO_ERROR, debug=""
Since the server is sending NO_ERROR
, your client should simply try to reconnect, and not treat the message as an error.
As to why the ALB is sending GOAWAYs... I'm not sure, can you give us more details about that?
答案2
得分: 0
@frederik-deweerdt的答案应该被接受为答案,关于应用负载均衡器,以下是来自AWS论坛帖子的类似问题的答案,https://forums.aws.amazon.com/thread.jspa?messageID=771883򼜫
您的客户端收到的HTTP/2 GOAWAY响应是应用负载均衡器正常关闭的连接。应用负载均衡器通常允许空闲连接持续到配置的空闲超时时间,其默认值为60秒。然而,有一些条件可以触发关闭空闲连接。在HTTP/1.1连接上,允许一个未完成的请求完成,然后正常关闭TCP连接。在HTTP/2连接上,负载均衡器通过发送HTTP/2 GOAWAY来优雅地关闭这些连接。根据RFC 7540的规定,"GOAWAY允许一个端点在仍然完成先前建立的流的处理的同时优雅地停止接受新的流"。客户端应该通过完成进行中的请求、关闭连接并在需要时重新连接来响应。应用负载均衡器将在访问日志中记录每个请求的HTTP状态,而不是连接状态的关闭信号。
由于收到HTTP/2 GOAWAY而导致错误的客户端应该检查以确保它们完全符合HTTP/2的RFC。
您可以在RFC 7540的第6.8节中了解更多关于HTTP/2 GOAWAY方法的信息。
https://www.rfc-editor.org/rfc/rfc7540#section-6.8
如果您对弹性负载均衡器的此项或其他行为有任何进一步的问题,请告诉我们。
英文:
@frederik-deweerdt answer should be accepted as the answer, regarding Application Load Balancers specifically, here is the answer from an AWS Forums post for similar question, https://forums.aws.amazon.com/thread.jspa?messageID=771883&#771883
The HTTP/2 GOAWAY response your clients are receiving are connections that are being closed gracefully by the Application Load Balancer. Application Load Balancers will generally allow idle connections to last until the configured idle timeout, which has a default of 60 seconds. However, there are some conditions which can trigger closing of idle connections. On an HTTP/1.1 connection, an outstanding request is allowed to complete, and then the TCP connection is torn down normally. On an HTTP/2 connection, the load balancer initiates a close of these connections gracefully by sending an HTTP/2 GOAWAY. Per RFC 7540 "GOAWAY allows an endpoint to gracefully stop accepting new streams while still finishing processing of previously established streams". The client should respond by completing in progress requests, closing the connection and reconnecting if needed. Application Load Balancer will log the HTTP status for each request in the access log, and not the connection state closing signal.
Clients experiencing errors due to having received an HTTP/2 GOAWAY should be checked to ensure they are fully compliant with HTTP/2 RFCs.
You can read more about HTTP/2 GOAWAY method in section 6.8 of RFC 7540.
https://www.rfc-editor.org/rfc/rfc7540#section-6.8
Please let us know if you have any further questions about this or other behavior of Elastic Load Balancers.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论