2023年2月27日 02:40:04go评论186阅读模式

英文:

Debugging hard to find RabbitMQ Frame Errors

问题

I'm running a RabbitMQ Server and client applications in minikube for development. I'm receiving intermittent 501 Frame Errors. The Error occurs pretty consistently when under load (60 msg/sec, 2-5kb/msg).

Error Message

From the RabbitMQ logs.

2023-02-26 16:43:12.635470+00:00 [error] &lt;0.1056.0&gt;  operation none caused a connection exception frame_error: &quot;type 3, first 16 octets = &lt;&lt;\&quot;{\\\&quot;payload\\\&quot;:{\\\&quot;res\&quot;&gt;&gt;: {invalid_frame_end_marker,\n                                                      
99}&quot;
2023-02-26 16:43:15.638860+00:00 [error] &lt;0.1056.0&gt; closing AMQP connection &lt;0.1056.0&gt; (10.244.0.18:60608 -&gt; 10.244.0.21:5672):
2023-02-26 16:43:15.638860+00:00 [error] &lt;0.1056.0&gt; fatal_frame_error

Client

A Deno App using the deno-amqp library.

TCP Dump

Wireshark shows the TCP segment (?) sent (just before the server reports an error)

ending in 0x63 (which is 99 in decimal! - see error message).
containing in order: end of a content frame, a publish method, a content header (with size 4428), start of content frame (with header size 4428, actual 4384 - missing 44 bytes)
however, the contained started content frame does not match the error message as it starts with {"payload": { "id"... ("id" and not "res")

The remaining content body is sent to the server just after the server reported the invalid frame end marker. The remaining content contains exactly the missing amount of bytes (44) from the previously started content frame (and its length header).

Validating Frames before sending

I've made sure to notice if my AMQP client were to incorrectly encode frames - no issues there.

if (data[7 + payload.byteLength] !== 206) {
    console.log(&#39;sending invalid frame end&#39;)
    console.log({ frame, data });
}

No concurrent TCP connection writes

I have lots of async functions publishing messages. I made sure to always group and fully write all sequential frames (publish method, header, body(s)) to the buffer. I used writeAll.
If I understand correctly, Deno.Conn by default stops the event loop while writing.

Reproducing the error

I haven't been able to reproduce the issue using the same library and creating stress tests. I successfully sent messages many times larger and faster without issue to a docker RabbitMQ instance.

Spreading load over channels

I've tried publishing messages using 10 channels and round-robin distribution. This helped and run much longer until the error eventually showed up as well.

Open Questions

Does using writeAll mean that I'm guaranteed to be able to write all bytes in one go, regardless of underlying buffer size?
Is it possible that this is the cause of network congestion? I feel like ca. 300 kb/s should not be a problem.
Any ideas what's going on here?

英文:

Error Message

From the RabbitMQ logs.

2023-02-26 16:43:12.635470+00:00 [error] &lt;0.1056.0&gt;  operation none caused a connection exception frame_error: &quot;type 3, first 16 octets = &lt;&lt;\&quot;{\\\&quot;payload\\\&quot;:{\\\&quot;res\&quot;&gt;&gt;: {invalid_frame_end_marker,\n                                                      
99}&quot;
2023-02-26 16:43:15.638860+00:00 [error] &lt;0.1056.0&gt; closing AMQP connection &lt;0.1056.0&gt; (10.244.0.18:60608 -&gt; 10.244.0.21:5672):
2023-02-26 16:43:15.638860+00:00 [error] &lt;0.1056.0&gt; fatal_frame_error

Client

A Deno App using the deno-amqp library.

TCP Dump

Wireshark shows the TCP segment (?) sent (just before the server reports an error)

ending in 0x63 (which is 99 in decimal! - see error message).
containing in order: end of a content frame, a publish method, a content header (with size 4428), start of content frame (with header size 4428, actual 4384 - missing 44 bytes)
however, the contained started content frame does not match the error message as it starts with {"payload": { "id"... ("id" and not "res")

Validating Frames before sending

I've made sure to notice if my AMQP client were to incorrectly encode frames - no issues there.

if (data[7 + payload.byteLength] !== 206) {
    console.log(&#39;sending invalid frame end&#39;)
    console.log({ frame, data });
}

No concurrent TCP connection writes

Reproducing the error

I haven't been able to reproduce the issue using the same library and creating stress tests. I successfully sent messages many times larger and faster without issue to a docker RabbitMQ instance.

Spreading load over channels

I've tried publishing messages using 10 channels and round-robin distribution. This helped and run much longer until the error eventually showed up as well.

Open Questions

Does using writeAll mean that I'm guaranteed to be able to write all bytes in one go, regardless of underlying buffer size?
Is it possible that this is the cause of network congestion? I feel like ca. 300 kb/s should not be a problem.
Any ideas what's going on here?

答案1

得分: 0

这是 deno-amqp 中的一个错误...

编辑：这个问题已经修复。

英文:

This was a bug in deno-amqp...

Edit: this has been fixed

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

调试难以找到的RabbitMQ帧错误

问题

答案1

如何将一个值添加到输入滑块（范围滑块）的中心？

拖放，mouseenter 触发在不同的元素上

使用不同的页面模板用于React登录/注册页面？

How to search and match if a sheet name from a cell value in another sheet in google apps script and copy cells

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论