重新审视Flatbuffers与Protocol Buffers

huangapple go评论67阅读模式
英文:

Revisit of Flatbuffers vs. Protocol Buffers

问题

自2014年由Kenton Varda编写的这篇文章以来,关于使用FlatBuffers与Protobuf的用例是否发生了变化?或者是否出现了其他首选的数据交换格式/库?

根据我所了解的情况,自2014年以来似乎没有或只有很少的变化,可以总结如下:

  • Protobuf

    • 适用于较小的消息(1MB或更少)
    • 更适合程序员使用
  • FlatBuffer

    • 用于较大的消息
    • 针对高效解析进行了优化
    • 在内存中具有更好的表示

请注意,这只是对该文章中提到的两种格式的总结,并且在不同的使用场景中,选择一种格式可能仍然更合适。如果有关其他数据交换格式或库的信息,我需要更多上下文来提供更多信息。

英文:

Since this article (written by Kenton Varda in 2014), has anything changed about the use cases of using FlatBuffers vs. Protobuf? Alternatively, has something else come along that is now the preferred format/library for data exchange?

Feature Protobuf Cap'n Proto SBE FlatBuffers
Schema evolution yes yes caveats yes
Zero-copy no yes yes yes
Random-access reads no yes no yes
Safe against malicious input yes yes yes opt-in upfront
Reflection / generic algorithms yes yes yes yes
Initialization order any any preorder bottom-up
Unknown field retention removed in proto3 yes no no
Object-capability RPC system no yes no no
Schema language custom custom XML custom
Usable as mutable state yes no no no
Padding takes space on wire? no optional yes yes
Unset fields take space on wire? no yes yes no
Pointers take space on wire? no yes no yes
C++ yes yes (C++11)* yes yes
Java yes yes* yes yes
C# yes yes* yes yes*
Go yes yes no yes*
Other languages lots! 6+ others* no no
Authors' preferred use case distributed computing platforms / sandboxing financial trading games

As best as I can tell, there seems to be no or minimal changes since the 2014 article which can be summarized by:

  • Protobuf

    • Preferred for smaller messages (1MB or less)
    • More programmer friendly
  • FlatBuffer

    • Used for larger messages
    • Optimized for efficient parsing
    • Better in-memory representation

答案1

得分: 2

有自9年前以来,FlatBuffers已经有了很多改进,但仅从表格中看,这些条目会发生变化:

RPC系统:FlatBuffers现在具有开箱即用的gRPC支持(适用于多种语言)。
可用作可变状态:FlatBuffers现在具有类似于Protobuf的“对象API”,建立在基本API之上。
其他语言:有很多选择!

对于较小的消息来说是完全可以的,只要您不期望与Protobuf的变长编码提供的相同级别的“压缩”。

对象API更适合程序员使用,但速度较慢,与Protobuf类似。

英文:

There's been a lot of improvements to FlatBuffers since 9 years ago, but from that table alone these entries would change:

RPC system: FlatBuffers has out of the box gRPC support (for multiple languages).
Usable as mutable state: FlatBuffers now has an "Object API" that is similar to Protobuf, on top of the base API.
Other languages: lots!

It is perfectly fine for smaller messages, as long as you don't expect the same level of "compression" Protobuf's varints give.

The Object API is more programmer friendly, though is also slower, much like Protobuf.

答案2

得分: 1

在这些问题中,答案通常是“取决于情况”!

就个人而言,我认为该功能列表还不完整。我会添加以下内容:

  • "约束" - 为字段定义有效值范围和列表的有效长度的能力
  • "多种传输格式" - 编码数据的不同方式的能力,例如高度紧凑的比特优化编码以用于无线电链路传输(往往带宽有限),以及更适合程序员的格式,如XML
  • "值定义" - 不仅可以定义数据类型,还可以定义数据类型的固定实例,这些实例会出现在生成的源代码中
  • "基于值的约束定义" - 使用规定的值来规定约束的能力。这应包括指示“X减1”(其中X是已定义的值)的语法
  • "严格的契约" - 传输格式在表示模式方面非常严格

约束

诸如ASN.1、XSD和JSON模式之类的序列化模式都允许对值和长度进行约束。这非常有用,因为在应用程序中通常很少有无界值和列表是有效的。在模式中表达约束并在生成的代码中遵守约束相对较少见 - 一些付费的XSD/XML工具可以做到,大多数ASN.1工具可以,大多数JSON验证器也可以。

优势在于可以获得更精确的契约。

我不太明白在该功能集中“安全防范恶意输入”是什么意思。但据我所知,在GPB中传达消息字段的有效值的唯一方法是:1)将其作为注释放入并希望开发人员注意到它,或者2)使用(beta版)GPB的第三方扩展来模拟ASN.1的约束。

多种传输格式

有时候,将数据序列化为紧凑的二进制格式以进行无线传输非常有用,但还需要将其序列化为可读格式,例如XML/JSON。GPB可以做到这一点(JSON),但不像ASN.1那样(具有多种二进制编码,以及JSON和XML传输格式)。

ASN.1使用约束来了解,例如,整数字段实际需要多少位。如果它受到在1000到1015之间的有效值的约束,那么它将在其非对齐的紧凑编码规则中仅使用4位。

优势在于可以在对大小不太重要的存储空间上获得良好的效率,但在对大小不太重要的存储空间上获得更友好的存储空间。

值定义

如果要使用模式定义系统之间的接口,那么很可能有系统常量需要共享。将它们放入模式中非常有用。

据我所知,在所有这些模式中,只有ASN.1在其模式语言中具有此功能。

基于值的约束定义

这只是约束的扩展。与使用文字表达约束不同,使用值来表达约束。例如,在ASN.1中,您可以这样定义:

listLen INTEGER ::= 10

List ::= SET
{
   list [0] SEQUENCE (SIZE(listLen)) OF REAL,
   defaultEntry [1] INTEGER (0..<listLen)
}

第一行定义了一个值为10的常数整数。下一块定义了一个包含浮点值列表的类,该列表具有listLen个条目,以及一个限制为0到9的值的索引。应用程序逻辑将使用listLen来遍历列表,并且defaultEntry保证是有效的值。如果您需要将列表更改为11个条目,只需更改第1行并重新构建。

对于通信系统来说,这可能是非常有价值的;协议消息集的所有内容和协议常量都可以以这种方式定义,并且对其进行的任何调整仅在ASN.1模式内进行(只需微调和重新构建)。

严格的契约

这是指有效的传输格式与模式紧密一致。在这方面,GPB做得相当糟糕 - 例如,它会很高兴地并且悄悄地解析消息中的多个oneof字段,仅保留最后一个,这让我感到惊讶;我期望检测到多个字段会引发某种错误!

总结

基本上,这归结为“我相当喜欢ASN.1”,这是基于构建通信系统的经验而来的。电话行业以ASN.1为基础并不奇怪。

看到那些不考虑上述方面的团队开始项目开发并选择他们听说过的第一个东西,然后发现没有人真正阅读过ICDs,代码库变得难以更改等等,这相当有趣。ASN.1可能有着悠久的历史,但多年来一直在不断更新,并且在轻松可靠地让系统进行通信方面解决了很多问题。对于它的良好工具需要花钱,但我很愿意花点钱来节省大量时间和风险。

英文:

As ever, with such questions the answer is "it depends"!

Personally speaking, I'd argue that that feature list is incomplete. I'd add:

  • "Constraints" - the ability to define valid value ranges for fields, and valid lengths for lists
  • "Multiple wire formats" - the ability to encode data in different ways, e.g. highly packed bit-optimised encodings for transmission over radio links (which tend to be bandwidth constrained) as well as more programmer-friendly formats like XML
  • "Values Definition" - the ability not only to define data types, but to also define fixed instances of data types that show up in generated source code
  • "Constraints Definitions in Terms of Values" - the ability to use defined values in the specification of constraints. This should include syntax to indicate "1 less than X" where X is a defined value
  • "Strict Contract" - the wireformat is rigid in its representation of the schema

Constraints

Serialisation schemas such as ASN.1, XSD, and JSON schema all allow constraints on value and length. This is very useful, because it's fairly rare that unbounded values and lists are in fact valid within the application. Having constraints expressed in the schema and honoured in the generated code is comparatively rare - some paid-for XSD / XML tools do it, most ASN.1 tools do, and most JSON validators do.

The advantage is that one can have a more exact contract.

I'm not quite sure what is meant by "safe against malicious input" in that feature set. But so far as I know, in GPB the only way of conveying what is a valid value for a message field is to 1) put it in as a comment and hope the developer spots it, or, 2) use the (beta) third party extensions for GPB that mimic ASN.1's constraints.

Multiple Wire Formats

It's sometimes useful to be able to serialise data to, say, a packed binary format for wireless transmission, but also be able to serialise it to a readable format like XML / JSON. GPB does this kind of thing (JSON), but not to the extent ASN.1 does (numerous binary encodings, plus JSON and XML wireformats too).

ASN.1 uses the constraints to understand, for example, exactly how many bits an INTEGER field actually needs. If it's constrained to valid values between 1000 and 1015 inclusive, it'll use only 4 bits in its unaligned packed encoding rules.

The advantage is that you can get good efficiency on links where that matters, but also more programmer-friendly storage where size is less relevant.

Values Definition

If one is using a schema to define an interface between systems, it's quite likely there are system constants that they need to share. Putting them in a schema is quite useful.

Of them all, AFAIK only ASN.1 has this in its schema language.

Constraints Definitions in Terms of Values

This is just an extension of constraints. Rather than express constraints with literals, express them with values. For instance, in ASN.1 you can have:

listLen INTEGER ::= 10

List ::= SET
{
   list [0] SEQUENCE (SIZE(listLen)) OF REAL,
   defaultEntry [1] INTEGER (0..&lt;listLen)
}

The first line defines a constant int of value 10. The next chunk defines a class that contains a list of floating point values listLen entries long, and an index into that list that is constrained to the values 0 to 9. The application logic would use listLen to iterate over the list, and the defaultEntry is guaranteed to be a valid value. If you ever need the list to be 11 long, just change line 1 and rebuild.

For communications systems this can be gold-dust; the entirety of the protocol message set and the protocol constants can be defined this way, and any tuning of it takes place solely within the ASN.1 schema; no application source code need be changed (you just tweak and rebuild).

Strict Contract

This is where valid wireformat strongly adheres to the schema. GPB is quite bad at this - for example it'll quite happily and silently parse multiple oneof fields in a message keeping only the last one, which I find surprising; I'd expect the detection of multiple fields to throw some sort of error!

Overall

Essentially, it boils down to "I quite like ASN.1", born out of experience building communications sytems. It's not surprising that the telephony industry is built on ASN.1.

It's quite intersting seeing teams that don't consider aspects such as the above launch off into project development and just pick the first thing they've heard of, and then get into a dreadful mess when it turns out that no one has really read the ICDs, code bases become to difficult to change, etc.

ASN.1 might have old origins, but it's been constantly updated over the decades and solves an awful lot of problems when it comes to getting systems communicating easily and reliably. Good tools for it cost money, but I'm quite happy to spend a bit of money to safe a ton of time, risk.

huangapple
  • 本文由 发表于 2023年7月18日 04:36:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76707912.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定