关于将消息自动编码为Base64并提交到SQS的规则。

huangapple go评论101阅读模式
英文:

Rules regarding auto encoding of messages into base64 while submitting to SQS

问题

我正在开发一个应用程序,其中客户端(使用多种语言编写,包括Go、C++、Python、C#、Java、Perl,以及可能在将来使用的其他语言)将protobuf(有时也是JSON)消息提交到SQS。在另一端,这些消息由Python和Go客户端读取和解码-取决于消息类型。Boto似乎会自动将消息编码为base64,但其他语言库似乎不会这样做。或者可能有其他规则?

Boto确实有一个选项可以提交原始消息。

这里的预期行为是什么?我应该自己将消息编码为base64-这使得boto成为一个奇怪的例子-还是我漏掉了什么?

这在我的应用程序中引起了一些微妙的错误,因为多了一层base64编码或解码。据我所知,没有一种惯用的方法来检测消息是否经过base64编码。最好的选择是尝试解码并查看是否会抛出异常-这是我不太喜欢的。

我尝试查找一些文档,但没有找到清晰指南。也许我在错误的地方找?

提前感谢任何指导。

英文:

I am developing an application in which clients (written in multiple languages - Go, C++, Python, C#, Java, Perl and possibly more in the future) submit protobuf (and in some cases, JSON) messages to SQS. At the other end, the messages are read and decoded by Python and Go clients - depending on the message type. Boto seems to automatically encode the messages into base64, but other language libraries don't seem to do so. Or maybe there are some other rules?

Boto does have an option to submit raw messages.

What is the expected behavior here? Am I supposed to encode messages into base64 on my own - which makes boto an odd case - or am I missing something?

This has caused some subtle bugs in my application because an of extra layer of base64 encoding or decoding. As far as I know, there is no idiomatic way to detect whether a message is base64 encoded or not. The best option is to try to decode and see if it throws an exception - something I don't really like.

I tried to look for some documentation, but couldn't find anything with clear guidelines. Maybe I was looking at the wrong places?

Thanks in advance for any pointers.

答案1

得分: 6

你可能想要将消息编码为某种格式,因为SQS在API中不接受所有可能的字节组合作为消息负载。只支持有效的UTF-8、制表符、换行符和回车符。

重要提示:
以下列表显示了根据W3C XML规范允许在消息中使用的字符(Unicode表示)。要获取更多信息,请访问http://www.w3.org/TR/REC-xml/#charsets。如果发送的字符不在列表中,请求将被拒绝。

#x9 | #xA | #xD | [#x20 to #xD7FF] | [#xE000 to #xFFFD] | [#x10000 to #x10FFFF]

http://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_SendMessage.html

Base64字母表显然在这个范围内,这使得使用Base64编码的消息不可能被拒绝为无效。当然,它也会增加负载大小,因为Base64将原始消息的每3个字节扩展为4个输出字节(64个符号限制每个输出字节携带6位可用信息,3 x 8 → 4 x 6)。

大概boto会自动为您对消息进行Base64编码和解码,以便提供帮助。

但是没有理由一定要使用Base64。

一个想到的例子是...有效的JSON也符合SQS负载支持的受限字符范围。(理论上,我猜,可以争论JSON不是一种“编码”,但那有点学究气)。

没有干净的方法来确定消息是否需要多次解码,除非采用您提出的不太可靠的方法,但可以提出这样的论点:如果您处于需要解码的情况不明确的情况下,那么应该消除这种情况。

如果boto的行为没有记录,并且没有办法使其以其他方式运行,我会说这是错误的行为。但是,由于有记录的行为,我不得不稍微让步并说这只是不寻常的。

英文:

You probably want to encode your messages as something because SQS does not accept every possible byte combination in message payload, at the API. Only valid UTF-8, tab, newline, and carriage return are supported.

>Important

>The following list shows the characters (in Unicode) allowed in your message, according to the W3C XML specification. For more information, go to http://www.w3.org/TR/REC-xml/#charsets If you send any characters not included in the list, your request will be rejected.

>#x9 | #xA | #xD | [#x20 to #xD7FF] | [#xE000 to #xFFFD] | [#x10000 to #x10FFFF]

>http://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_SendMessage.html

The base64 alphabet clearly falls in this range, making it impossible for a message with base64 encoding to be rejected as invalid. Of course, it also bloats your payload, since base64 expands every 3 bytes of the original message into 4 bytes of output (64 symbols limits each output byte to carrying 6 bits of usable information, 3 x 8 → 4 x 6).

Presumably boto automatically base64-encodes and decodes messages for you in order to be "helpful."

But there is no reason why base64 has to be used at all.

An example that comes to mind... valid JSON would also comply with the restricted character ranges supported by SQS payloads. (Theoretically, I guess, JSON could be argued not to be an "encoding," but that would be a bit pedantic).

There is no clean way to determine whether a message needs to be decoded more than once, other than the sketchy one you proposed, but the argument could be made that if you are in a situation where the need to decode is ambiguous, then that should be eliminated.

If boto's behavior weren't documented and there were no way to make it behave otherwise, I'd say it is wrong behavior. But, as it is, I'll have to relent a bit and say it's just unusual.

huangapple
  • 本文由 发表于 2015年10月8日 23:00:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/33019426.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定