protobuf解包未知消息

huangapple go评论86阅读模式
英文:

protobuf unmarshal unknown message

问题

我有一个接收protobuf消息的监听器。然而,它不知道什么时候会收到哪种类型的消息。所以我尝试将其解组为interface{},以便稍后进行类型转换:

var data interface{}
err := proto.Unmarshal(message, data)
if err != nil {
  log.Fatal("解组错误:", err)
}
log.Printf("%v\n", data)

然而,这段代码无法编译通过:

无法将data类型为interface{}作为proto.Unmarshal的参数类型proto.Message使用
  interface{}未实现proto.Message缺少ProtoMessage方法

在Go语言中,如何解组和稍后对一个"未知"的protobuf消息进行类型转换呢?

英文:

I have a listener which receives protobuf messages. However it doesn't know which type of message comes in when. So I tried to unmarshal into an interface{} so I can later type cast:

var data interface{}
err := proto.Unmarshal(message, data)
if err != nil {
  log.Fatal("unmarshaling error: ", err)
}
log.Printf("%v\n", data)

However this code doesn't compile:

cannot use data (type interface {}) as type proto.Message in argument to proto.Unmarshal:
  interface {} does not implement proto.Message (missing ProtoMessage method)

How can I unmarshal and later type cast an "unknown" protobuf message in go?

答案1

得分: 7

首先,关于提问者的问题,如下所示:

proto.Unmarshal 无法将数据解组为 interface{}。方法的签名很明显,你必须传递一个 proto.Message 参数,它是由具体的 Protobuf 类型实现的接口。

当处理一个原始的 Protobuf []byte 数据,而不是 Any 类型时,理想情况下,你至少有一些其他信息(比如字符串、数字等),可以用来映射到具体的 Protobuf 消息。

然后,你可以根据这些信息使用 switch 语句实例化相应的 Protobuf 具体类型,然后将该参数传递给 Unmarshal 方法:

var message proto.Message
switch atLeastSomething {
    case "foo":
        message = &mypb.Foo{}
    case "bar":
        message = &mypb.Bar{}
}
_ = proto.Unmarshal(data, message)

<hr>

那么,如果字节数据是完全未知的呢?

首先要注意的是,在实际情况下,这种情况应该很少发生。用于生成你选择的编程语言中 Protobuf 类型的模式表示了一个契约,通过接受 Protobuf 数据,你在某种程度上履行了该契约。

但是,如果由于某种原因你必须处理一个完全未知、神秘的 Protobuf 字节数据,你可以使用 protowire 包从中提取一些信息。

需要注意的是,Protobuf 消息的二进制表示是模棱两可的。一个重要的不确定性来源是“长度限定”类型(2),它用于字符串、字节、重复字段和子消息(参考)。

你可以提取有效载荷内容,但是语义会比较弱。

代码

有了这个前提,下面是一个未知 Protobuf 消息的解析器示例。其思路是利用 protowire.ConsumeField 方法来遍历原始字节切片。

数据模型可以定义如下:

type Field struct {
	Tag Tag
	Val Val
}

type Tag struct {
	Num  int32
	Type protowire.Type
}

type Val struct {
	Payload interface{}
	Length  int
}

解析器的代码如下:

func parseUnknown(b []byte) []Field {
	fields := make([]Field, 0)
	for len(b) > 0 {
		n, t, fieldlen := protowire.ConsumeField(b)
		if fieldlen < 1 {
			return nil
		}
		field := Field{
			Tag: Tag{Num: int32(n), Type: t},
		}

		_, _, taglen := protowire.ConsumeTag(b[:fieldlen])
		if taglen < 1 {
			return nil
		}

		var (
			v    interface{}
			vlen int
		)
		switch t {
		case protowire.VarintType:
			v, vlen = protowire.ConsumeVarint(b[taglen:fieldlen])

		case protowire.Fixed64Type:
			v, vlen = protowire.ConsumeFixed64(b[taglen:fieldlen])

		case protowire.BytesType:
			v, vlen = protowire.ConsumeBytes(b[taglen:fieldlen])
			sub := parseUnknown(v.([]byte))
			if sub != nil {
				v = sub
			}

		case protowire.StartGroupType:
			v, vlen = protowire.ConsumeGroup(n, b[taglen:fieldlen])
			sub := parseUnknown(v.([]byte))
			if sub != nil {
				v = sub
			}

		case protowire.Fixed32Type:
			v, vlen = protowire.ConsumeFixed32(b[taglen:fieldlen])
		}

		if vlen < 1 {
			return nil
		}

		field.Val = Val{Payload: v, Length: vlen - taglen}
        // fmt.Printf("%#v\n", field)

		fields = append(fields, field)
		b = b[fieldlen:]
	}
	return fields
}

示例输入和输出

假设有如下的 Protobuf 模式:

message Foo {
  string a = 1;
  string b = 2;
  Bar bar = 3;
}

message Bar {
  string c = 1;
}

在 Go 中初始化为:

&test.Foo{A: "A", B: "B", Bar: &test.Bar{C: "C"}}

在上述代码的循环末尾添加 fmt.Printf("%#v\n", field) 语句后,输出如下:

main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x41}, Length:1}}
main.Field{Tag:main.Tag{Num:2, Type:2}, Val:main.Val{Payload:[]uint8{0x42}, Length:1}}
main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x43}, Length:1}}
main.Field{Tag:main.Tag{Num:3, Type:2}, Val:main.Val{Payload:[]main.Field{main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x43}, Length:1}}}, Length:3}}

关于子消息

从上面的输出可以看出,处理可能是消息字段的 protowire.BytesType 的思路是尝试递归解析它。如果成功,我们保留生成的 msg 并将其存储在字段值中;如果失败,我们将字节保持原样,这可能是一个 Protobuf 的 stringbytes。顺便说一句,如果我理解正确,这似乎是 Marc Gravell 在 Protogen 代码 中所做的。

关于重复字段

上述代码没有明确处理重复字段,但在解析完成后,重复字段将具有相同的 Field.Tag.Num 值。根据这一点,将字段打包到一个切片/数组中应该很简单。

关于映射

上述代码也没有处理 Protobuf 映射。我猜测映射在语义上等同于重复的键值对,例如:

message Pair {
    string key = 1; // 或者其他键类型
    string val = 2; // 或者其他值类型
}

如果我的假设正确,那么映射可以作为子消息使用上述代码进行解析。

关于 oneof

我还没有测试过这一点,但我预计联合类型的信息将完全丢失。字节数据将只包含实际设置的值。

那么 Any 类型呢?

Any 类型与你可能认为的不同。与 JSON 对象的 map[string]interface{} 类似,Any 不是类似的。原因很简单:Any 是一个具有非常明确定义结构的 Protobuf 消息,即(在 Go 中):

type Any struct {
    // 未导出的字段
	TypeUrl string // 省略结构标签
	Value   []byte // 省略结构标签
}

因此,它更类似于 Go 中的 interface{} 实现,它保存一些实际数据及其类型信息。

它可以保存任意的 Protobuf 数据(包括类型信息!),但不能用于解码未知消息,因为 Any 只有这两个字段,类型 URL 和字节数据。

<hr>

总之,本回答并没有提供一个完整的、适用于生产环境的解决方案,但它展示了如何解码任意的 Protobuf 数据并尽可能地保留原始语义。希望这能指引你朝着正确的方向前进。

英文:

First, two words about the OP's question, as presented by them:

proto.Unmarshal can't unmarshal into an interface{}. The method signature is obvious, you must pass a proto.Message argument, which is an interface implemented by concrete protobuffer types.

When handling a raw protobuffer []byte payload that didn't come in an Any, ideally you have at least something (a string, a number, etc...) coming together with the byte slice, that you can use to map to the concrete protobuf message.

You can then switch on that and instantiate the appropriate protobuf concrete type, and only then pass that argument to Unmarshal:

var message proto.Message
switch atLeastSomething {
case &quot;foo&quot;:
message = &amp;mypb.Foo{}
case &quot;bar&quot;:
message = &amp;mypb.Bar{}
}
_ = proto.Unmarshal(data, message)

<hr>

Now, what if the byte payload is truly unknown?

As a foreword, consider that this should seldom happen in practice. The schema used to generate the protobuffer types in your language of choice represents a contract, and by accepting protobuffer payloads you are, for some definitions of it, fulfilling that contract.

Anyway, if for some reason you must deal with a completely unknown, mysterious, protobuffer payload in wire format, you can extract some information from it with the protowire package.

Be aware that the wire representation of a protobuf message is ambiguous. A big source of uncertainty is the "length-delimited" type (2) being used for strings, bytes, repeated fields and... sub-messages (reference).

You can retrieve the payload content, but you are bound to have weak semantics.

The code

With that said, this is what a parser for unknown proto messages may look like. The idea is to leverage protowire.ConsumeField to read through the original byte slice.

The data model could be like this:

type Field struct {
Tag Tag
Val Val
}
type Tag struct {
Num int32
Type protowire.Type
}
type Val struct {
Payload interface{}
Length int
}

And the parser:

func parseUnknown(b []byte) []Field {
fields := make([]Field, 0)
for len(b) &gt; 0 {
n, t, fieldlen := protowire.ConsumeField(b)
if fieldlen &lt; 1 {
return nil
}
field := Field{
Tag: Tag{Num: int32(n), Type: t },
}
_, _, taglen := protowire.ConsumeTag(b[:fieldlen])
if taglen &lt; 1 {
return nil
}
var (
v interface{}
vlen int
)
switch t {
case protowire.VarintType:
v, vlen = protowire.ConsumeVarint(b[taglen:fieldlen])
case protowire.Fixed64Type:
v, vlen = protowire.ConsumeFixed64(b[taglen:fieldlen])
case protowire.BytesType:
v, vlen = protowire.ConsumeBytes(b[taglen:fieldlen])
sub := parseUnknown(v.([]byte))
if sub != nil {
v = sub
}
case protowire.StartGroupType:
v, vlen = protowire.ConsumeGroup(n, b[taglen:fieldlen])
sub := parseUnknown(v.([]byte))
if sub != nil {
v = sub
}
case protowire.Fixed32Type:
v, vlen = protowire.ConsumeFixed32(b[taglen:fieldlen])
}
if vlen &lt; 1 {
return nil
}
field.Val = Val{Payload: v, Length: vlen - taglen}
// fmt.Printf(&quot;%#v\n&quot;, field)
fields = append(fields, field)
b = b[fieldlen:]
}
return fields
}

Sample input and output

Given a proto schema like:

message Foo {
string a = 1;
string b = 2;
Bar bar = 3;
}
message Bar {
string c = 1;
}

initialized in Go as:

&amp;test.Foo{A: &quot;A&quot;, B: &quot;B&quot;, Bar: &amp;test.Bar{C: &quot;C&quot;}}

And by adding a fmt.Printf(&quot;%#v\n&quot;, field) statement at the end of the loop in the above code, it will output the following:

main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x41}, Length:1}}
main.Field{Tag:main.Tag{Num:2, Type:2}, Val:main.Val{Payload:[]uint8{0x42}, Length:1}}
main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x43}, Length:1}}
main.Field{Tag:main.Tag{Num:3, Type:2}, Val:main.Val{Payload:[]main.Field{main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x43}, Length:1}}}, Length:3}}

About sub-messages

As you can see from the above the idea to deal with a protowire.BytesType that may or may not be a message field is to attempt to parse it, recursively. If it succeeds, we keep the resulting msg and store it in the field value, if it fails, we store the bytes as-is, which then may be a proto string or bytes. BTW, if I'm reading correctly, this seems what Marc Gravell does in the Protogen code.

About repeated fields

The code above doesn't deal with repeated fields explicitly, but after the parsing is done, repeated fields will have the same value for Field.Tag.Num. From that, packing the fields into a slice/array should be trivial.

About maps

The code above also doesn't deal with proto maps. I suspect that maps are semantically equivalent to a repeated k/v pair, e.g.:

message Pair {
string key = 1; // or whatever key type
string val = 2; // or whatever val type
}

If my assumption is correct, then maps can be parsed with the given code as sub-messages.

About oneofs

I haven't yet tested this, but I expect that information about the union type are completely lost. The byte payload will contain only the value that was actually set.

But what about Any?

The Any proto type doesn't fit in the picture. Contrary to what it may look like, Any is not analogous to, say, map[string]interface{} for JSON objects. And the reason is simple: Any is a proto message with a very well defined structure, namely (in Go):

type Any struct {
// unexported fields
TypeUrl string // struct tags omitted
Value []byte   // struct tags omitted
}

So it is more similar to the implementation of a Go interface{} in that it holds some actual data and that data's type information.

It can hold itself arbitrary proto payloads (with their type information!) but it can not be used to decode unknown messages, because Any has exactly those two fields, type url and a byte payload.

<hr>

To wrap up, this answer doesn't provide a full-blown production-grade solution, but it shows how to decode arbitrary payloads while preserving as much original semantics as possible. Hopefully it will point you in the right direction.

答案2

得分: -1

正如你所看到的,并且评论者指出,你不能使用proto.Unmarshalinterface{}类型作为参数,因为该方法期望的是实现了MessageV1接口的Message类型。

Protobuf消息是有类型的,并且对应于方法调用和实现,不能接受通用的protobuf类型,而是特定的protobuf类型:

func (s *server) M(ctx context.Context, _ *pb.Foo) (*pb.Bar, error)

解决方案是将通用类型封装为特定类型(例如Envelope),并使用Any进行包装:

message Envelope {
  google.protobuf.Any content = 1;
  ...
}

然后,content[]byte的形式传输(参见Golang的anypb.Any),而实现部分(anypb)包括打包和解包的方法。

Any的“技巧”在于消息包含一个唯一标识消息的[TypeURL],以便接收方知道如何进行Unmarshal等操作。

英文:

As you've seen, and the commenters have pointed out, you can't use proto.Unmarshal to interface{} because, the method expects a type Message which implements an interface MessageV1.

Protobuf messages are typed and correspond to method invocations ("comes in") and the implementation cannot take generic types of protobuf but specific protobufs:

func (s *server) M(ctx context.Context, _ *pb.Foo) (*pb.Bar, error)

The solution is to envelope your generic types as Any within a specific type perhaps Envelope:

message Envelope {
  google.protobuf.Any content = 1;
  ...
}

The content is then transmitted as a []byte (see Golang anypb.Any) and the implementation (anypb) includes methods to pack|unpack these.

The 'trick' with Any is that messages include a [TypeURL] that uniquely identifies the message so that the receiver knows how to e.g. Unmarshal it.

huangapple
  • 本文由 发表于 2016年12月28日 00:14:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/41348512.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定