调试难以找到的RabbitMQ帧错误

huangapple go评论78阅读模式
英文:

Debugging hard to find RabbitMQ Frame Errors

问题

I'm running a RabbitMQ Server and client applications in minikube for development. I'm receiving intermittent 501 Frame Errors. The Error occurs pretty consistently when under load (60 msg/sec, 2-5kb/msg).

Error Message

From the RabbitMQ logs.

2023-02-26 16:43:12.635470+00:00 [error] <0.1056.0>  operation none caused a connection exception frame_error: "type 3, first 16 octets = <<\"{\\\"payload\\\":{\\\"res\">>: {invalid_frame_end_marker,\n                                                      
99}"
2023-02-26 16:43:15.638860+00:00 [error] <0.1056.0> closing AMQP connection <0.1056.0> (10.244.0.18:60608 -> 10.244.0.21:5672):
2023-02-26 16:43:15.638860+00:00 [error] <0.1056.0> fatal_frame_error

Client

A Deno App using the deno-amqp library.

TCP Dump
调试难以找到的RabbitMQ帧错误
Wireshark shows the TCP segment (?) sent (just before the server reports an error)

  • ending in 0x63 (which is 99 in decimal! - see error message).
  • containing in order: end of a content frame, a publish method, a content header (with size 4428), start of content frame (with header size 4428, actual 4384 - missing 44 bytes)
  • however, the contained started content frame does not match the error message as it starts with {"payload": { "id"... ("id" and not "res")

The remaining content body is sent to the server just after the server reported the invalid frame end marker. The remaining content contains exactly the missing amount of bytes (44) from the previously started content frame (and its length header).

Validating Frames before sending

I've made sure to notice if my AMQP client were to incorrectly encode frames - no issues there.

if (data[7 + payload.byteLength] !== 206) {
    console.log('sending invalid frame end')
    console.log({ frame, data });
}

No concurrent TCP connection writes

I have lots of async functions publishing messages. I made sure to always group and fully write all sequential frames (publish method, header, body(s)) to the buffer. I used writeAll.
If I understand correctly, Deno.Conn by default stops the event loop while writing.

Reproducing the error

I haven't been able to reproduce the issue using the same library and creating stress tests. I successfully sent messages many times larger and faster without issue to a docker RabbitMQ instance.

Spreading load over channels

I've tried publishing messages using 10 channels and round-robin distribution. This helped and run much longer until the error eventually showed up as well.

Open Questions

  • Does using writeAll mean that I'm guaranteed to be able to write all bytes in one go, regardless of underlying buffer size?
  • Is it possible that this is the cause of network congestion? I feel like ca. 300 kb/s should not be a problem.
  • Any ideas what's going on here?
英文:

I'm running a RabbitMQ Server and client applications in minikube for development. I'm receiving intermittent 501 Frame Errors. The Error occurs pretty consistently when under load (60 msg/sec, 2-5kb/msg).

Error Message

From the RabbitMQ logs.

2023-02-26 16:43:12.635470+00:00 [error] <0.1056.0>  operation none caused a connection exception frame_error: "type 3, first 16 octets = <<\"{\\\"payload\\\":{\\\"res\">>: {invalid_frame_end_marker,\n                                                      
99}"
2023-02-26 16:43:15.638860+00:00 [error] <0.1056.0> closing AMQP connection <0.1056.0> (10.244.0.18:60608 -> 10.244.0.21:5672):
2023-02-26 16:43:15.638860+00:00 [error] <0.1056.0> fatal_frame_error

Client

A Deno App using the deno-amqp library.

TCP Dump
调试难以找到的RabbitMQ帧错误
Wireshark shows the TCP segment (?) sent (just before the server reports an error)

  • ending in 0x63 (which is 99 in decimal! - see error message).
  • containing in order: end of a content frame, a publish method, a content header (with size 4428), start of content frame (with header size 4428, actual 4384 - missing 44 bytes)
  • however, the contained started content frame does not match the error message as it starts with {"payload": { "id"... ("id" and not "res")

The remaining content body is sent to the server just after the server reported the invalid frame end marker. The remaining content contains exactly the missing amount of bytes (44) from the previously started content frame (and its length header).

Validating Frames before sending

I've made sure to notice if my AMQP client were to incorrectly encode frames - no issues there.

if (data[7 + payload.byteLength] !== 206) {
    console.log('sending invalid frame end')
    console.log({ frame, data });
}

No concurrent TCP connection writes

I have lots of async functions publishing messages. I made sure to always group and fully write all sequential frames (publish method, header, body(s)) to the buffer. I used writeAll.
If I understand correctly, Deno.Conn by default stops the event loop while writing.

Reproducing the error

I haven't been able to reproduce the issue using the same library and creating stress tests. I successfully sent messages many times larger and faster without issue to a docker RabbitMQ instance.

Spreading load over channels

I've tried publishing messages using 10 channels and round-robin distribution. This helped and run much longer until the error eventually showed up as well.

Open Questions

  • Does using writeAll mean that I'm guaranteed to be able to write all bytes in one go, regardless of underlying buffer size?
  • Is it possible that this is the cause of network congestion? I feel like ca. 300 kb/s should not be a problem.
  • Any ideas what's going on here?

答案1

得分: 0

这是 deno-amqp 中的一个错误...

编辑:这个问题已经修复。

英文:

This was a bug in deno-amqp...

Edit: this has been fixed

huangapple
  • 本文由 发表于 2023年2月27日 02:40:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/75574227.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定