Could I reuse an existing protobuf binary, when marshaling a message including it?(protobuf3)

huangapple go评论84阅读模式
英文:

Could I reuse an existing protobuf binary, when marshaling a message including it?(protobuf3)

问题

Protobuf定义如下:

syntax = "proto3";

message HugeMessage {
    // 省略
}

message Request {
    string name = 1;
    HugeMessage payload = 2;
}

在某种情况下,我从某人那里收到了一个HugeMessage,我想要在其上添加额外的字段,然后将消息传递给其他人。因此,我需要将HugeMessage的二进制数据解组为Go结构,将其打包到Request中,然后再次进行编组。由于HugeMessage的大小巨大,解组和编组的成本是无法承受的。那么,我能否在不更改protobuf定义的情况下重用HugeMessage的二进制数据?

func main() {
    // 从文件或网络接收,不重要。
    bins, _ := os.ReadFile("hugeMessage.dump")
    var message HugeMessage
    _ = proto.Unmarshal(bins, &message) // 慢
    request := Request{
        name:    "xxxx",
        payload: message,
    }
    requestBinary, _ := proto.Marshal(&request) // 慢
    // 发送。
    os.WriteFile("request.dump", requestBinary, 0644)
}
英文:

Protobuf definations are like that:

syntax = "proto3"

message HugeMessage {
    // omitted
}

message Request {
    string name = 1;
    HugeMessage payload = 2;
}

In a situation I received a HugeMessage from somebody, and I want to pack it with additional fields, and then transmit the message to someone else. So that I have to Unmarshal the HugeMessage binary into Go structure, pack it into Request, and Marshal again. Due to the hgue size for HugeMessage, the cost for Unmarshal and Marshal are unaffordable. so could I reuse the HugeMessage binary without change the protobuf definations?

func main() {
	// receive it from file or network, not important.
	bins, _ := os.ReadFile("hugeMessage.dump")
	var message HugeMessage
	_ = proto.Unmarshal(bins, &message) // slow
	request := Request{
		name: "xxxx",
		payload: message,
	}
	requestBinary, _ := proto.Marshal(&request) // slow
	// send it.
	os.WriteFile("request.dump", requestBinary, 0644)
}

答案1

得分: 1

简短回答是:没有简单或标准的方法来实现这一点。

最明显的策略是像你目前所做的那样 - 反序列化HugeMessage,将其设置到Request中,然后再次序列化。Golang的protobuf API并没有提供太多其他的方法 - 这是有原因的。

也就是说,有办法实现你想要做的事情。但是这些方法不一定安全或可靠,所以你必须权衡现在的成本和你所要付出的成本。

你可以避免反序列化的方法之一是利用消息通常的序列化方式:

message Request {
    string name = 1;
    HugeMessage payload = 2;
}

.. 等同于

message Request {
    string name = 1;
    bytes payload = 2;
}

.. 其中payload包含对某个HugeMessage调用Marshal(...)的结果。

因此,如果我们有以下定义:

syntax = "proto3";

message HugeMessage {
  bytes field1 = 1;
  string field2 = 2;
  int64 field3 = 3;
}

message Request {
  string name = 1;
  HugeMessage payload = 2;
}

message RawRequest {
  string name = 1;
  bytes payload = 2;
}

以下代码:

req1, err := proto.Marshal(&pb.Request{
	Name: "name",
	Payload: &pb.HugeMessage{
		Field1: []byte{1, 2, 3},
		Field2: "test",
		Field3: 948414,
	},
})
if err != nil {
	panic(err)
}

huge, err := proto.Marshal(&pb.HugeMessage{
	Field1: []byte{1, 2, 3},
	Field2: "test",
	Field3: 948414,
})
if err != nil {
	panic(err)
}

req2, err := proto.Marshal(&pb.RawRequest{
	Name:    "name",
	Payload: huge,
})
if err != nil {
	panic(err)
}

fmt.Printf("equal? %t\n", bytes.Equal(req1, req2))

输出 equal? true

这种“怪癖”是否完全可靠还不清楚,并且不能保证它将持续工作。显然,RawRequest类型必须完全与Request类型相匹配,这并不理想。

另一种选择是以更手动的方式构建消息,即使用protowire包 - 同样需要小心谨慎。

英文:

The short answer is: no, there is no simple or standard way to achieve this.

The most obvious strategy is to do as you currently have - unmarshal the HugeMessage, set it into Request, then marshal again. The golang protobuf API surface doesn't really provide a means to do much beyond that - with good reason.

That said, there are ways to achieve what you're looking to do. But these aren't necessarily safe or reliable, so you have to weigh that cost vs the cost of what you have now.

One way you can avoid the unmarshal is to take advantage of the way a message is normally serialized;

message Request {
    string name = 1;
    HugeMessage payload = 2;
}

.. is equivalent to

message Request {
    string name = 1;
    bytes payload = 2;
}

.. where payload contains the result of calling Marshal(...) against some HugeMessage.

So, if we have the following definitions:

syntax = "proto3";

message HugeMessage {
  bytes field1 = 1;
  string field2 = 2;
  int64 field3 = 3;
}

message Request {
  string name = 1;
  HugeMessage payload = 2;
}

message RawRequest {
  string name = 1;
  bytes payload = 2;
}

The following code:

req1, err := proto.Marshal(&pb.Request{
	Name: "name",
	Payload: &pb.HugeMessage{
		Field1: []byte{1, 2, 3},
		Field2: "test",
		Field3: 948414,
	},
})
if err != nil {
	panic(err)
}

huge, err := proto.Marshal(&pb.HugeMessage{
	Field1: []byte{1, 2, 3},
	Field2: "test",
	Field3: 948414,
})
if err != nil {
	panic(err)
}

req2, err := proto.Marshal(&pb.RawRequest{
	Name:    "name",
	Payload: huge,
})
if err != nil {
	panic(err)
}

fmt.Printf("equal? %t\n", bytes.Equal(req1, req2))

outputs equal? true

Whether this "quirk" is entirely reliable isn't clear, and there is no guarantees it will continue to work indefinitely. And obviously the RawRequest type has to fully mirror the Request type, which isn't ideal.

Another alternative is to construct the message in a more manual fashion, i.e. using the protowire package - again, haphazard, caution advised.

答案2

得分: 1

很快,可以通过protowire来实现,如果重用的结构不复杂,那么并不难。

我不久前问过这个问题,最终在@nj_的帖子的启发下解决了。根据protobuf的编码章节所述,协议缓冲区消息是一系列字段-值对,这些对的顺序并不重要。一个明显的想法浮现在我脑海中:就像protoc编译器一样工作,手动构建嵌入字段并将其附加到请求的末尾。

在这种情况下,我们想要重用Request中的HugeMessage,所以字段的键值对将是2:{${HugeMessageBinary}}。所以代码(稍有不同)可以是:

func binaryEmbeddingImplementation(messageBytes []byte, name string) (requestBytes []byte, err error) {
    // 1. 创建一个除了有效载荷之外都准备好的请求,并进行编组。
    request := protodef.Request{
        Name: name,
    }
    requestBytes, err = proto.Marshal(&request)
    if err != nil {
        return nil, err
    }
    // 2. 通过protowire手动将有效载荷附加到请求中。
    requestBytes = protowire.AppendTag(requestBytes, 2, protowire.BytesType) // 嵌入的消息在wire view中与字节字段相同。
    requestBytes = protowire.AppendBytes(requestBytes, messageBytes)
    return requestBytes, nil
}

告诉字段编号、字段类型和字节,就是这样。通常的方式就是这样。

func commonImplementation(messageBytes []byte, name string) (requestBytes []byte, err error) {
    // 从文件或网络接收,不重要。
    var message protodef.HugeMessage
    _ = proto.Unmarshal(messageBytes, &message) // 慢
    request := protodef.Request{
        Name:    name,
        Payload: &message,
    }
    return proto.Marshal(&request) // 慢
}

一些基准测试。

$ go test -bench=a -benchtime 10s ./pkg/                               
goos: darwin
goarch: arm64
pkg: pbembedding/pkg
BenchmarkCommon-8             49         288026442 ns/op
BenchmarkEmbedding-8         201         176032133 ns/op
PASS
ok      pbembedding/pkg 80.196s

package pkg

import (
    "github.com/stretchr/testify/assert"
    "golang.org/x/exp/rand"
    "google.golang.org/protobuf/proto"
    "pbembedding/pkg/protodef"
    "testing"
)

var hugeMessageSample = receiveHugeMessageFromSomewhere()

func TestEquivalent(t *testing.T) {
    requestBytes1, _ := commonImplementation(hugeMessageSample, "xxxx")
    requestBytes2, _ := binaryEmbeddingImplementation(hugeMessageSample, "xxxx")
    // 它们在字节上并不总是相等的。你应该在消息视图中比较它们,而不是在二进制形式中比较
    // 原因:https://developers.google.com/protocol-buffers/docs/encoding#implications
    // 我很懒。
    assert.NotEmpty(t, requestBytes1)
    assert.Equal(t, requestBytes1, requestBytes2)
    var request protodef.Request
    err := proto.Unmarshal(requestBytes1, &request)
    assert.NoError(t, err)
    assert.Equal(t, "xxxx", request.Name)
}

// 实际上是模拟一个。
func receiveHugeMessageFromSomewhere() []byte {
    buffer := make([]byte, 1024*1024*1024)
    _, _ = rand.Read(buffer)
    message := protodef.HugeMessage{
        Data: buffer,
    }
    res, _ := proto.Marshal(&message)
    return res
}

func BenchmarkCommon(b *testing.B) {
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        _, err := commonImplementation(hugeMessageSample, "xxxx")
        if err != nil {
            panic(err)
        }
    }
}

func BenchmarkEmbedding(b *testing.B) {
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        _, err := binaryEmbeddingImplementation(hugeMessageSample, "xxxx")
        if err != nil {
            panic(err)
        }
    }
}

英文:

Shortly, it could be done via protowire, and not really hard if structure reused isn't complex.

I asked this question not long ago, and I finally work it out inspired by @nj_ 's post. According to the encoding chapter of protobuf, a protocol buffer message is a series of field-value pairs, and the order of those pairs doesn't matter. An obvious idea comes to me: just works like the protoc compiler, make up the embedded field handly and append it to the end of the request.

In this situation, we want to reuse the HugeMessage in Request, so the key-value pair of the field would be 2:{${HugeMessageBinary}}. So the code(a little different) could be:

func binaryEmbeddingImplementation(messageBytes []byte, name string) (requestBytes []byte, err error) {
    // 1. create a request with all ready except the payload. and marshal it.
    request := protodef.Request{
        Name: name,
    }
    requestBytes, err = proto.Marshal(&amp;request)
    if err != nil {
        return nil, err
    }
    // 2. manually append the payload to the request, by protowire.
    requestBytes = protowire.AppendTag(requestBytes, 2, protowire.BytesType) //  embedded message is same as a bytes field, in wire view.
    requestBytes = protowire.AppendBytes(requestBytes, messageBytes)
    return requestBytes, nil
}

Tell the field number, field type and the bytes, That's all. Commom way is like that.

func commonImplementation(messageBytes []byte, name string) (requestBytes []byte, err error) {
    // receive it from file or network, not important.
    var message protodef.HugeMessage
    _ = proto.Unmarshal(messageBytes, &amp;message) // slow
    request := protodef.Request{
        Name:    name,
        Payload: &amp;message,
    }
    return proto.Marshal(&amp;request) // slow
}

Some benchmark.

$ go test -bench=a -benchtime 10s ./pkg/                               
goos: darwin
goarch: arm64
pkg: pbembedding/pkg
BenchmarkCommon-8             49         288026442 ns/op
BenchmarkEmbedding-8         201         176032133 ns/op
PASS
ok      pbembedding/pkg 80.196s

package pkg

import (
    &quot;github.com/stretchr/testify/assert&quot;
    &quot;golang.org/x/exp/rand&quot;
    &quot;google.golang.org/protobuf/proto&quot;
    &quot;pbembedding/pkg/protodef&quot;
    &quot;testing&quot;
)

var hugeMessageSample = receiveHugeMessageFromSomewhere()

func TestEquivalent(t *testing.T) {
    requestBytes1, _ := commonImplementation(hugeMessageSample, &quot;xxxx&quot;)
    requestBytes2, _ := binaryEmbeddingImplementation(hugeMessageSample, &quot;xxxx&quot;)
    // They are not always equal int bytes. you should compare them in message view instead of binary from
    // due to: https://developers.google.com/protocol-buffers/docs/encoding#implications
    // I&#39;m Lazy.
    assert.NotEmpty(t, requestBytes1)
    assert.Equal(t, requestBytes1, requestBytes2)
    var request protodef.Request
    err := proto.Unmarshal(requestBytes1, &amp;request)
    assert.NoError(t, err)
    assert.Equal(t, &quot;xxxx&quot;, request.Name)
}

// actually mock one.
func receiveHugeMessageFromSomewhere() []byte {
    buffer := make([]byte, 1024*1024*1024)
    _, _ = rand.Read(buffer)
    message := protodef.HugeMessage{
        Data: buffer,
    }
    res, _ := proto.Marshal(&amp;message)
    return res
}

func BenchmarkCommon(b *testing.B) {
    b.ResetTimer()
    for i := 0; i &lt; b.N; i++ {
        _, err := commonImplementation(hugeMessageSample, &quot;xxxx&quot;)
        if err != nil {
            panic(err)
        }
    }
}

func BenchmarkEmbedding(b *testing.B) {
    b.ResetTimer()
    for i := 0; i &lt; b.N; i++ {
        _, err := binaryEmbeddingImplementation(hugeMessageSample, &quot;xxxx&quot;)
        if err != nil {
            panic(err)
        }
    }
}

huangapple
  • 本文由 发表于 2022年11月18日 15:51:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/74486451.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定