
huangapple go评论117阅读模式

protobuf unmarshal unknown message



  1. var data interface{}
  2. err := proto.Unmarshal(message, data)
  3. if err != nil {
  4. log.Fatal("解组错误:", err)
  5. }
  6. log.Printf("%v\n", data)


  1. 无法将data类型为interface{}作为proto.Unmarshal的参数类型proto.Message使用
  2. interface{}未实现proto.Message缺少ProtoMessage方法



I have a listener which receives protobuf messages. However it doesn't know which type of message comes in when. So I tried to unmarshal into an interface{} so I can later type cast:

  1. var data interface{}
  2. err := proto.Unmarshal(message, data)
  3. if err != nil {
  4. log.Fatal("unmarshaling error: ", err)
  5. }
  6. log.Printf("%v\n", data)

However this code doesn't compile:

  1. cannot use data (type interface {}) as type proto.Message in argument to proto.Unmarshal:
  2. interface {} does not implement proto.Message (missing ProtoMessage method)

How can I unmarshal and later type cast an "unknown" protobuf message in go?


得分: 7


proto.Unmarshal 无法将数据解组为 interface{}。方法的签名很明显,你必须传递一个 proto.Message 参数,它是由具体的 Protobuf 类型实现的接口。

当处理一个原始的 Protobuf []byte 数据,而不是 Any 类型时,理想情况下,你至少有一些其他信息(比如字符串、数字等),可以用来映射到具体的 Protobuf 消息。

然后,你可以根据这些信息使用 switch 语句实例化相应的 Protobuf 具体类型,然后将该参数传递给 Unmarshal 方法:

  1. var message proto.Message
  2. switch atLeastSomething {
  3. case "foo":
  4. message = &mypb.Foo{}
  5. case "bar":
  6. message = &mypb.Bar{}
  7. }
  8. _ = proto.Unmarshal(data, message)



首先要注意的是,在实际情况下,这种情况应该很少发生。用于生成你选择的编程语言中 Protobuf 类型的模式表示了一个契约,通过接受 Protobuf 数据,你在某种程度上履行了该契约。

但是,如果由于某种原因你必须处理一个完全未知、神秘的 Protobuf 字节数据,你可以使用 protowire 包从中提取一些信息。

需要注意的是,Protobuf 消息的二进制表示是模棱两可的。一个重要的不确定性来源是“长度限定”类型(2),它用于字符串、字节、重复字段和子消息(参考)。



有了这个前提,下面是一个未知 Protobuf 消息的解析器示例。其思路是利用 protowire.ConsumeField 方法来遍历原始字节切片。


  1. type Field struct {
  2. Tag Tag
  3. Val Val
  4. }
  5. type Tag struct {
  6. Num int32
  7. Type protowire.Type
  8. }
  9. type Val struct {
  10. Payload interface{}
  11. Length int
  12. }


  1. func parseUnknown(b []byte) []Field {
  2. fields := make([]Field, 0)
  3. for len(b) > 0 {
  4. n, t, fieldlen := protowire.ConsumeField(b)
  5. if fieldlen < 1 {
  6. return nil
  7. }
  8. field := Field{
  9. Tag: Tag{Num: int32(n), Type: t},
  10. }
  11. _, _, taglen := protowire.ConsumeTag(b[:fieldlen])
  12. if taglen < 1 {
  13. return nil
  14. }
  15. var (
  16. v interface{}
  17. vlen int
  18. )
  19. switch t {
  20. case protowire.VarintType:
  21. v, vlen = protowire.ConsumeVarint(b[taglen:fieldlen])
  22. case protowire.Fixed64Type:
  23. v, vlen = protowire.ConsumeFixed64(b[taglen:fieldlen])
  24. case protowire.BytesType:
  25. v, vlen = protowire.ConsumeBytes(b[taglen:fieldlen])
  26. sub := parseUnknown(v.([]byte))
  27. if sub != nil {
  28. v = sub
  29. }
  30. case protowire.StartGroupType:
  31. v, vlen = protowire.ConsumeGroup(n, b[taglen:fieldlen])
  32. sub := parseUnknown(v.([]byte))
  33. if sub != nil {
  34. v = sub
  35. }
  36. case protowire.Fixed32Type:
  37. v, vlen = protowire.ConsumeFixed32(b[taglen:fieldlen])
  38. }
  39. if vlen < 1 {
  40. return nil
  41. }
  42. field.Val = Val{Payload: v, Length: vlen - taglen}
  43. // fmt.Printf("%#v\n", field)
  44. fields = append(fields, field)
  45. b = b[fieldlen:]
  46. }
  47. return fields
  48. }


假设有如下的 Protobuf 模式:

  1. message Foo {
  2. string a = 1;
  3. string b = 2;
  4. Bar bar = 3;
  5. }
  6. message Bar {
  7. string c = 1;
  8. }

在 Go 中初始化为:

  1. &test.Foo{A: "A", B: "B", Bar: &test.Bar{C: "C"}}

在上述代码的循环末尾添加 fmt.Printf("%#v\n", field) 语句后,输出如下:

  1. main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x41}, Length:1}}
  2. main.Field{Tag:main.Tag{Num:2, Type:2}, Val:main.Val{Payload:[]uint8{0x42}, Length:1}}
  3. main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x43}, Length:1}}
  4. main.Field{Tag:main.Tag{Num:3, Type:2}, Val:main.Val{Payload:[]main.Field{main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x43}, Length:1}}}, Length:3}}


从上面的输出可以看出,处理可能是消息字段的 protowire.BytesType 的思路是尝试递归解析它。如果成功,我们保留生成的 msg 并将其存储在字段值中;如果失败,我们将字节保持原样,这可能是一个 Protobuf 的 stringbytes。顺便说一句,如果我理解正确,这似乎是 Marc Gravell 在 Protogen 代码 中所做的。


上述代码没有明确处理重复字段,但在解析完成后,重复字段将具有相同的 Field.Tag.Num 值。根据这一点,将字段打包到一个切片/数组中应该很简单。


上述代码也没有处理 Protobuf 映射。我猜测映射在语义上等同于重复的键值对,例如:

  1. message Pair {
  2. string key = 1; // 或者其他键类型
  3. string val = 2; // 或者其他值类型
  4. }


关于 oneof


那么 Any 类型呢?

Any 类型与你可能认为的不同。与 JSON 对象的 map[string]interface{} 类似,Any 不是类似的。原因很简单:Any 是一个具有非常明确定义结构的 Protobuf 消息,即(在 Go 中):

  1. type Any struct {
  2. // 未导出的字段
  3. TypeUrl string // 省略结构标签
  4. Value []byte // 省略结构标签
  5. }

因此,它更类似于 Go 中的 interface{} 实现,它保存一些实际数据及其类型信息。

它可以保存任意的 Protobuf 数据(包括类型信息!),但不能用于解码未知消息,因为 Any 只有这两个字段,类型 URL 和字节数据。


总之,本回答并没有提供一个完整的、适用于生产环境的解决方案,但它展示了如何解码任意的 Protobuf 数据并尽可能地保留原始语义。希望这能指引你朝着正确的方向前进。


First, two words about the OP's question, as presented by them:

proto.Unmarshal can't unmarshal into an interface{}. The method signature is obvious, you must pass a proto.Message argument, which is an interface implemented by concrete protobuffer types.

When handling a raw protobuffer []byte payload that didn't come in an Any, ideally you have at least something (a string, a number, etc...) coming together with the byte slice, that you can use to map to the concrete protobuf message.

You can then switch on that and instantiate the appropriate protobuf concrete type, and only then pass that argument to Unmarshal:

  1. var message proto.Message
  2. switch atLeastSomething {
  3. case &quot;foo&quot;:
  4. message = &amp;mypb.Foo{}
  5. case &quot;bar&quot;:
  6. message = &amp;mypb.Bar{}
  7. }
  8. _ = proto.Unmarshal(data, message)


Now, what if the byte payload is truly unknown?

As a foreword, consider that this should seldom happen in practice. The schema used to generate the protobuffer types in your language of choice represents a contract, and by accepting protobuffer payloads you are, for some definitions of it, fulfilling that contract.

Anyway, if for some reason you must deal with a completely unknown, mysterious, protobuffer payload in wire format, you can extract some information from it with the protowire package.

Be aware that the wire representation of a protobuf message is ambiguous. A big source of uncertainty is the "length-delimited" type (2) being used for strings, bytes, repeated fields and... sub-messages (reference).

You can retrieve the payload content, but you are bound to have weak semantics.

The code

With that said, this is what a parser for unknown proto messages may look like. The idea is to leverage protowire.ConsumeField to read through the original byte slice.

The data model could be like this:

  1. type Field struct {
  2. Tag Tag
  3. Val Val
  4. }
  5. type Tag struct {
  6. Num int32
  7. Type protowire.Type
  8. }
  9. type Val struct {
  10. Payload interface{}
  11. Length int
  12. }

And the parser:

  1. func parseUnknown(b []byte) []Field {
  2. fields := make([]Field, 0)
  3. for len(b) &gt; 0 {
  4. n, t, fieldlen := protowire.ConsumeField(b)
  5. if fieldlen &lt; 1 {
  6. return nil
  7. }
  8. field := Field{
  9. Tag: Tag{Num: int32(n), Type: t },
  10. }
  11. _, _, taglen := protowire.ConsumeTag(b[:fieldlen])
  12. if taglen &lt; 1 {
  13. return nil
  14. }
  15. var (
  16. v interface{}
  17. vlen int
  18. )
  19. switch t {
  20. case protowire.VarintType:
  21. v, vlen = protowire.ConsumeVarint(b[taglen:fieldlen])
  22. case protowire.Fixed64Type:
  23. v, vlen = protowire.ConsumeFixed64(b[taglen:fieldlen])
  24. case protowire.BytesType:
  25. v, vlen = protowire.ConsumeBytes(b[taglen:fieldlen])
  26. sub := parseUnknown(v.([]byte))
  27. if sub != nil {
  28. v = sub
  29. }
  30. case protowire.StartGroupType:
  31. v, vlen = protowire.ConsumeGroup(n, b[taglen:fieldlen])
  32. sub := parseUnknown(v.([]byte))
  33. if sub != nil {
  34. v = sub
  35. }
  36. case protowire.Fixed32Type:
  37. v, vlen = protowire.ConsumeFixed32(b[taglen:fieldlen])
  38. }
  39. if vlen &lt; 1 {
  40. return nil
  41. }
  42. field.Val = Val{Payload: v, Length: vlen - taglen}
  43. // fmt.Printf(&quot;%#v\n&quot;, field)
  44. fields = append(fields, field)
  45. b = b[fieldlen:]
  46. }
  47. return fields
  48. }

Sample input and output

Given a proto schema like:

  1. message Foo {
  2. string a = 1;
  3. string b = 2;
  4. Bar bar = 3;
  5. }
  6. message Bar {
  7. string c = 1;
  8. }

initialized in Go as:

  1. &amp;test.Foo{A: &quot;A&quot;, B: &quot;B&quot;, Bar: &amp;test.Bar{C: &quot;C&quot;}}

And by adding a fmt.Printf(&quot;%#v\n&quot;, field) statement at the end of the loop in the above code, it will output the following:

  1. main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x41}, Length:1}}
  2. main.Field{Tag:main.Tag{Num:2, Type:2}, Val:main.Val{Payload:[]uint8{0x42}, Length:1}}
  3. main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x43}, Length:1}}
  4. main.Field{Tag:main.Tag{Num:3, Type:2}, Val:main.Val{Payload:[]main.Field{main.Field{Tag:main.Tag{Num:1, Type:2}, Val:main.Val{Payload:[]uint8{0x43}, Length:1}}}, Length:3}}

About sub-messages

As you can see from the above the idea to deal with a protowire.BytesType that may or may not be a message field is to attempt to parse it, recursively. If it succeeds, we keep the resulting msg and store it in the field value, if it fails, we store the bytes as-is, which then may be a proto string or bytes. BTW, if I'm reading correctly, this seems what Marc Gravell does in the Protogen code.

About repeated fields

The code above doesn't deal with repeated fields explicitly, but after the parsing is done, repeated fields will have the same value for Field.Tag.Num. From that, packing the fields into a slice/array should be trivial.

About maps

The code above also doesn't deal with proto maps. I suspect that maps are semantically equivalent to a repeated k/v pair, e.g.:

  1. message Pair {
  2. string key = 1; // or whatever key type
  3. string val = 2; // or whatever val type
  4. }

If my assumption is correct, then maps can be parsed with the given code as sub-messages.

About oneofs

I haven't yet tested this, but I expect that information about the union type are completely lost. The byte payload will contain only the value that was actually set.

But what about Any?

The Any proto type doesn't fit in the picture. Contrary to what it may look like, Any is not analogous to, say, map[string]interface{} for JSON objects. And the reason is simple: Any is a proto message with a very well defined structure, namely (in Go):

  1. type Any struct {
  2. // unexported fields
  3. TypeUrl string // struct tags omitted
  4. Value []byte // struct tags omitted
  5. }

So it is more similar to the implementation of a Go interface{} in that it holds some actual data and that data's type information.

It can hold itself arbitrary proto payloads (with their type information!) but it can not be used to decode unknown messages, because Any has exactly those two fields, type url and a byte payload.


To wrap up, this answer doesn't provide a full-blown production-grade solution, but it shows how to decode arbitrary payloads while preserving as much original semantics as possible. Hopefully it will point you in the right direction.


得分: -1



  1. func (s *server) M(ctx context.Context, _ *pb.Foo) (*pb.Bar, error)


  1. message Envelope {
  2. google.protobuf.Any content = 1;
  3. ...
  4. }




As you've seen, and the commenters have pointed out, you can't use proto.Unmarshal to interface{} because, the method expects a type Message which implements an interface MessageV1.

Protobuf messages are typed and correspond to method invocations ("comes in") and the implementation cannot take generic types of protobuf but specific protobufs:

  1. func (s *server) M(ctx context.Context, _ *pb.Foo) (*pb.Bar, error)

The solution is to envelope your generic types as Any within a specific type perhaps Envelope:

  1. message Envelope {
  2. google.protobuf.Any content = 1;
  3. ...
  4. }

The content is then transmitted as a []byte (see Golang anypb.Any) and the implementation (anypb) includes methods to pack|unpack these.

The 'trick' with Any is that messages include a [TypeURL] that uniquely identifies the message so that the receiver knows how to e.g. Unmarshal it.

  • 本文由 发表于 2016年12月28日 00:14:21
  • 转载请务必保留本文链接:



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
