2023年7月18日 04:36:15go评论77阅读模式

英文:

Revisit of Flatbuffers vs. Protocol Buffers

问题

自2014年由Kenton Varda编写的这篇文章以来，关于使用FlatBuffers与Protobuf的用例是否发生了变化？或者是否出现了其他首选的数据交换格式/库？

根据我所了解的情况，自2014年以来似乎没有或只有很少的变化，可以总结如下：

Protobuf
- 适用于较小的消息（1MB或更少）
- 更适合程序员使用
FlatBuffer
- 用于较大的消息
- 针对高效解析进行了优化
- 在内存中具有更好的表示

请注意，这只是对该文章中提到的两种格式的总结，并且在不同的使用场景中，选择一种格式可能仍然更合适。如果有关其他数据交换格式或库的信息，我需要更多上下文来提供更多信息。

英文:

Since this article (written by Kenton Varda in 2014), has anything changed about the use cases of using FlatBuffers vs. Protobuf? Alternatively, has something else come along that is now the preferred format/library for data exchange?

Feature	Protobuf	Cap'n Proto	SBE	FlatBuffers
Schema evolution	yes	yes	caveats	yes
Zero-copy	no	yes	yes	yes
Random-access reads	no	yes	no	yes
Safe against malicious input	yes	yes	yes	opt-in upfront
Reflection / generic algorithms	yes	yes	yes	yes
Initialization order	any	any	preorder	bottom-up
Unknown field retention	removed in proto3	yes	no	no
Object-capability RPC system	no	yes	no	no
Schema language	custom	custom	XML	custom
Usable as mutable state	yes	no	no	no
Padding takes space on wire?	no	optional	yes	yes
Unset fields take space on wire?	no	yes	yes	no
Pointers take space on wire?	no	yes	no	yes
C++	yes	yes (C++11)*	yes	yes
Java	yes	yes*	yes	yes
C#	yes	yes*	yes	yes*
Go	yes	yes	no	yes*
Other languages	lots!	6+ others*	no	no
Authors' preferred use case	distributed computing	platforms / sandboxing	financial trading	games

As best as I can tell, there seems to be no or minimal changes since the 2014 article which can be summarized by:

Protobuf
- Preferred for smaller messages (1MB or less)
- More programmer friendly
FlatBuffer
- Used for larger messages
- Optimized for efficient parsing
- Better in-memory representation

答案1

得分: 2

有自9年前以来，FlatBuffers已经有了很多改进，但仅从表格中看，这些条目会发生变化：

RPC系统：FlatBuffers现在具有开箱即用的gRPC支持（适用于多种语言）。
可用作可变状态：FlatBuffers现在具有类似于Protobuf的“对象API”，建立在基本API之上。
其他语言：有很多选择！

对于较小的消息来说是完全可以的，只要您不期望与Protobuf的变长编码提供的相同级别的“压缩”。

对象API更适合程序员使用，但速度较慢，与Protobuf类似。

英文:

There's been a lot of improvements to FlatBuffers since 9 years ago, but from that table alone these entries would change:

RPC system: FlatBuffers has out of the box gRPC support (for multiple languages).
Usable as mutable state: FlatBuffers now has an "Object API" that is similar to Protobuf, on top of the base API.
Other languages: lots!

It is perfectly fine for smaller messages, as long as you don't expect the same level of "compression" Protobuf's varints give.

The Object API is more programmer friendly, though is also slower, much like Protobuf.

答案2

得分: 1

在这些问题中，答案通常是“取决于情况”！

就个人而言，我认为该功能列表还不完整。我会添加以下内容：

"约束" - 为字段定义有效值范围和列表的有效长度的能力
"多种传输格式" - 编码数据的不同方式的能力，例如高度紧凑的比特优化编码以用于无线电链路传输（往往带宽有限），以及更适合程序员的格式，如XML
"值定义" - 不仅可以定义数据类型，还可以定义数据类型的固定实例，这些实例会出现在生成的源代码中
"基于值的约束定义" - 使用规定的值来规定约束的能力。这应包括指示“X减1”（其中X是已定义的值）的语法
"严格的契约" - 传输格式在表示模式方面非常严格

约束

诸如ASN.1、XSD和JSON模式之类的序列化模式都允许对值和长度进行约束。这非常有用，因为在应用程序中通常很少有无界值和列表是有效的。在模式中表达约束并在生成的代码中遵守约束相对较少见 - 一些付费的XSD/XML工具可以做到，大多数ASN.1工具可以，大多数JSON验证器也可以。

优势在于可以获得更精确的契约。

我不太明白在该功能集中“安全防范恶意输入”是什么意思。但据我所知，在GPB中传达消息字段的有效值的唯一方法是：1）将其作为注释放入并希望开发人员注意到它，或者2）使用（beta版）GPB的第三方扩展来模拟ASN.1的约束。

多种传输格式

有时候，将数据序列化为紧凑的二进制格式以进行无线传输非常有用，但还需要将其序列化为可读格式，例如XML/JSON。GPB可以做到这一点（JSON），但不像ASN.1那样（具有多种二进制编码，以及JSON和XML传输格式）。

ASN.1使用约束来了解，例如，整数字段实际需要多少位。如果它受到在1000到1015之间的有效值的约束，那么它将在其非对齐的紧凑编码规则中仅使用4位。

优势在于可以在对大小不太重要的存储空间上获得良好的效率，但在对大小不太重要的存储空间上获得更友好的存储空间。

值定义

如果要使用模式定义系统之间的接口，那么很可能有系统常量需要共享。将它们放入模式中非常有用。

据我所知，在所有这些模式中，只有ASN.1在其模式语言中具有此功能。

基于值的约束定义

这只是约束的扩展。与使用文字表达约束不同，使用值来表达约束。例如，在ASN.1中，您可以这样定义：

listLen INTEGER ::= 10

List ::= SET
{
   list [0] SEQUENCE (SIZE(listLen)) OF REAL,
   defaultEntry [1] INTEGER (0..<listLen)
}

第一行定义了一个值为10的常数整数。下一块定义了一个包含浮点值列表的类，该列表具有listLen个条目，以及一个限制为0到9的值的索引。应用程序逻辑将使用listLen来遍历列表，并且defaultEntry保证是有效的值。如果您需要将列表更改为11个条目，只需更改第1行并重新构建。

对于通信系统来说，这可能是非常有价值的；协议消息集的所有内容和协议常量都可以以这种方式定义，并且对其进行的任何调整仅在ASN.1模式内进行（只需微调和重新构建）。

严格的契约

这是指有效的传输格式与模式紧密一致。在这方面，GPB做得相当糟糕 - 例如，它会很高兴地并且悄悄地解析消息中的多个oneof字段，仅保留最后一个，这让我感到惊讶；我期望检测到多个字段会引发某种错误！

总结

基本上，这归结为“我相当喜欢ASN.1”，这是基于构建通信系统的经验而来的。电话行业以ASN.1为基础并不奇怪。

看到那些不考虑上述方面的团队开始项目开发并选择他们听说过的第一个东西，然后发现没有人真正阅读过ICDs，代码库变得难以更改等等，这相当有趣。ASN.1可能有着悠久的历史，但多年来一直在不断更新，并且在轻松可靠地让系统进行通信方面解决了很多问题。对于它的良好工具需要花钱，但我很愿意花点钱来节省大量时间和风险。

英文:

As ever, with such questions the answer is "it depends"!

Personally speaking, I'd argue that that feature list is incomplete. I'd add:

"Constraints" - the ability to define valid value ranges for fields, and valid lengths for lists
"Multiple wire formats" - the ability to encode data in different ways, e.g. highly packed bit-optimised encodings for transmission over radio links (which tend to be bandwidth constrained) as well as more programmer-friendly formats like XML
"Values Definition" - the ability not only to define data types, but to also define fixed instances of data types that show up in generated source code
"Constraints Definitions in Terms of Values" - the ability to use defined values in the specification of constraints. This should include syntax to indicate "1 less than X" where X is a defined value
"Strict Contract" - the wireformat is rigid in its representation of the schema

Constraints

Serialisation schemas such as ASN.1, XSD, and JSON schema all allow constraints on value and length. This is very useful, because it's fairly rare that unbounded values and lists are in fact valid within the application. Having constraints expressed in the schema and honoured in the generated code is comparatively rare - some paid-for XSD / XML tools do it, most ASN.1 tools do, and most JSON validators do.

The advantage is that one can have a more exact contract.

I'm not quite sure what is meant by "safe against malicious input" in that feature set. But so far as I know, in GPB the only way of conveying what is a valid value for a message field is to 1) put it in as a comment and hope the developer spots it, or, 2) use the (beta) third party extensions for GPB that mimic ASN.1's constraints.

Multiple Wire Formats

It's sometimes useful to be able to serialise data to, say, a packed binary format for wireless transmission, but also be able to serialise it to a readable format like XML / JSON. GPB does this kind of thing (JSON), but not to the extent ASN.1 does (numerous binary encodings, plus JSON and XML wireformats too).

ASN.1 uses the constraints to understand, for example, exactly how many bits an INTEGER field actually needs. If it's constrained to valid values between 1000 and 1015 inclusive, it'll use only 4 bits in its unaligned packed encoding rules.

The advantage is that you can get good efficiency on links where that matters, but also more programmer-friendly storage where size is less relevant.

Values Definition

If one is using a schema to define an interface between systems, it's quite likely there are system constants that they need to share. Putting them in a schema is quite useful.

Of them all, AFAIK only ASN.1 has this in its schema language.

Constraints Definitions in Terms of Values

This is just an extension of constraints. Rather than express constraints with literals, express them with values. For instance, in ASN.1 you can have:

listLen INTEGER ::= 10

List ::= SET
{
   list [0] SEQUENCE (SIZE(listLen)) OF REAL,
   defaultEntry [1] INTEGER (0..&lt;listLen)
}

The first line defines a constant int of value 10. The next chunk defines a class that contains a list of floating point values listLen entries long, and an index into that list that is constrained to the values 0 to 9. The application logic would use listLen to iterate over the list, and the defaultEntry is guaranteed to be a valid value. If you ever need the list to be 11 long, just change line 1 and rebuild.

For communications systems this can be gold-dust; the entirety of the protocol message set and the protocol constants can be defined this way, and any tuning of it takes place solely within the ASN.1 schema; no application source code need be changed (you just tweak and rebuild).

Strict Contract

This is where valid wireformat strongly adheres to the schema. GPB is quite bad at this - for example it'll quite happily and silently parse multiple oneof fields in a message keeping only the last one, which I find surprising; I'd expect the detection of multiple fields to throw some sort of error!

Overall

Essentially, it boils down to "I quite like ASN.1", born out of experience building communications sytems. It's not surprising that the telephony industry is built on ASN.1.

It's quite intersting seeing teams that don't consider aspects such as the above launch off into project development and just pick the first thing they've heard of, and then get into a dreadful mess when it turns out that no one has really read the ICDs, code bases become to difficult to change, etc.

ASN.1 might have old origins, but it's been constantly updated over the decades and solves an awful lot of problems when it comes to getting systems communicating easily and reliably. Good tools for it cost money, but I'm quite happy to spend a bit of money to safe a ton of time, risk.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

重新审视Flatbuffers与Protocol Buffers

问题

答案1

答案2

Java版本与Protobuf冲突。

JSON对我来说比Protobuf/gRPC更快，服务器使用Go，客户端使用PHP。

protobuf-net 在 .NET Core 中创建的类型在 .NET Framework 中反序列化时出现错误。

指定 Protobuf 扩展的 JSON 名称

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论