英文:
Debugging hard to find RabbitMQ Frame Errors
问题
I'm running a RabbitMQ Server and client applications in minikube for development. I'm receiving intermittent 501 Frame Errors. The Error occurs pretty consistently when under load (60 msg/sec, 2-5kb/msg).
Error Message
From the RabbitMQ logs.
2023-02-26 16:43:12.635470+00:00 [error] <0.1056.0> operation none caused a connection exception frame_error: "type 3, first 16 octets = <<\"{\\\"payload\\\":{\\\"res\">>: {invalid_frame_end_marker,\n
99}"
2023-02-26 16:43:15.638860+00:00 [error] <0.1056.0> closing AMQP connection <0.1056.0> (10.244.0.18:60608 -> 10.244.0.21:5672):
2023-02-26 16:43:15.638860+00:00 [error] <0.1056.0> fatal_frame_error
Client
A Deno App using the deno-amqp library.
TCP Dump
Wireshark shows the TCP segment (?) sent (just before the server reports an error)
- ending in 0x63 (which is 99 in decimal! - see error message).
- containing in order: end of a content frame, a publish method, a content header (with size 4428), start of content frame (with header size 4428, actual 4384 - missing 44 bytes)
- however, the contained started content frame does not match the error message as it starts with
{"payload": { "id"...
("id" and not "res")
The remaining content body is sent to the server just after the server reported the invalid frame end marker. The remaining content contains exactly the missing amount of bytes (44) from the previously started content frame (and its length header).
Validating Frames before sending
I've made sure to notice if my AMQP client were to incorrectly encode frames - no issues there.
if (data[7 + payload.byteLength] !== 206) {
console.log('sending invalid frame end')
console.log({ frame, data });
}
No concurrent TCP connection writes
I have lots of async functions publishing messages. I made sure to always group and fully write all sequential frames (publish method, header, body(s)) to the buffer. I used writeAll.
If I understand correctly, Deno.Conn by default stops the event loop while writing.
Reproducing the error
I haven't been able to reproduce the issue using the same library and creating stress tests. I successfully sent messages many times larger and faster without issue to a docker RabbitMQ instance.
Spreading load over channels
I've tried publishing messages using 10 channels and round-robin distribution. This helped and run much longer until the error eventually showed up as well.
Open Questions
- Does using
writeAll
mean that I'm guaranteed to be able to write all bytes in one go, regardless of underlying buffer size? - Is it possible that this is the cause of network congestion? I feel like ca. 300 kb/s should not be a problem.
- Any ideas what's going on here?
英文:
I'm running a RabbitMQ Server and client applications in minikube for development. I'm receiving intermittent 501 Frame Errors. The Error occurs pretty consistently when under load (60 msg/sec, 2-5kb/msg).
Error Message
From the RabbitMQ logs.
2023-02-26 16:43:12.635470+00:00 [error] <0.1056.0> operation none caused a connection exception frame_error: "type 3, first 16 octets = <<\"{\\\"payload\\\":{\\\"res\">>: {invalid_frame_end_marker,\n
99}"
2023-02-26 16:43:15.638860+00:00 [error] <0.1056.0> closing AMQP connection <0.1056.0> (10.244.0.18:60608 -> 10.244.0.21:5672):
2023-02-26 16:43:15.638860+00:00 [error] <0.1056.0> fatal_frame_error
Client
A Deno App using the deno-amqp library.
TCP Dump
Wireshark shows the TCP segment (?) sent (just before the server reports an error)
- ending in 0x63 (which is 99 in decimal! - see error message).
- containing in order: end of a content frame, a publish method, a content header (with size 4428), start of content frame (with header size 4428, actual 4384 - missing 44 bytes)
- however, the contained started content frame does not match the error message as it starts with
{"payload": { "id"...
("id" and not "res")
The remaining content body is sent to the server just after the server reported the invalid frame end marker. The remaining content contains exactly the missing amount of bytes (44) from the previously started content frame (and its length header).
Validating Frames before sending
I've made sure to notice if my AMQP client were to incorrectly encode frames - no issues there.
if (data[7 + payload.byteLength] !== 206) {
console.log('sending invalid frame end')
console.log({ frame, data });
}
No concurrent TCP connection writes
I have lots of async functions publishing messages. I made sure to always group and fully write all sequential frames (publish method, header, body(s)) to the buffer. I used writeAll.
If I understand correctly, Deno.Conn by default stops the event loop while writing.
Reproducing the error
I haven't been able to reproduce the issue using the same library and creating stress tests. I successfully sent messages many times larger and faster without issue to a docker RabbitMQ instance.
Spreading load over channels
I've tried publishing messages using 10 channels and round-robin distribution. This helped and run much longer until the error eventually showed up as well.
Open Questions
- Does using
writeAll
mean that I'm guaranteed to be able to write all bytes in one go, regardless of underlying buffer size? - Is it possible that this is the cause of network congestion? I feel like ca. 300 kb/s should not be a problem.
- Any ideas what's going on here?
答案1
得分: 0
这是 deno-amqp
中的一个错误...
编辑:这个问题已经修复。
英文:
This was a bug in deno-amqp
...
Edit: this has been fixed
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论