当使用不同的字段值时,Protobuf序列化长度会有所不同。

huangapple go评论126阅读模式
英文:

Protobuf serialization length differs when using different values for field

问题

我的Protobuf消息由3个双精度数组成。

  1. syntax = "proto3";
  2. message TestMessage{
  3. double input = 1;
  4. double output = 2;
  5. double info = 3;
  6. }

当我将这些值设置为:

  1. test.set_input(2.3456);
  2. test.set_output(5.4321);
  3. test.set_info(5.0);

序列化后的消息如下所示:

  1. 00000000 09 16 fb cb ee c9 c3 02 40 11 0a 68 22 6c 78 ba |........@..h"lx.|
  2. 00000010 15 40 19 |.@.|
  3. 00000013

使用test.serializeToArray序列化后,无法成功通过使用相同的Protobuf消息的Go程序进行反序列化。当尝试从C++程序中读取时,我得到了一个0作为info,因此消息似乎已损坏。

使用test.serializeToOstream时,我得到了以下消息,可以成功地由Go和C++程序进行反序列化:

  1. 00000000 09 16 fb cb ee c9 c3 02 40 11 0a 68 22 6c 78 ba |........@..h"lx.|
  2. 00000010 15 40 19 00 00 00 00 00 00 14 40 |.@........@|
  3. 0000001b

当将值设置为:

  1. test.set_input(2.3456);
  2. test.set_output(5.4321);
  3. test.set_info(5.5678);

通过test.serializeToArraytest.serializeToOstream序列化的消息如下所示:

  1. 00000000 09 16 fb cb ee c9 c3 02 40 11 0a 68 22 6c 78 ba |........@..h"lx.|
  2. 00000010 15 40 19 da ac fa 5c 6d 45 16 40 |.@....\mE.@|
  3. 0000001b

并且可以成功地由我的Go和C++程序读取。

我在这里漏掉了什么?为什么serializeToArray在第一种情况下不起作用?

编辑:
事实证明,serializeToString也可以正常工作。

这是我用于比较的代码:

  1. file_a.open(FILEPATH_A);
  2. file_b.open(FILEPATH_B);
  3. test.set_input(2.3456);
  4. test.set_output(5.4321);
  5. test.set_info(5.0);
  6. //serializeToArray
  7. int size = test.ByteSize();
  8. char *buffer = (char*) malloc(size);
  9. test.SerializeToArray(buffer, size);
  10. file_a << buffer;
  11. //serializeToString
  12. std::string buf;
  13. test.SerializeToString(&buf);
  14. file_b << buf;
  15. file_a.close();
  16. file_b.close();

为什么serializeToArray不按预期工作?

编辑2:
当使用file_b << buf.data()而不是file_b << buf时,数据也会损坏,但是为什么?

英文:

My Protobuf message consists of 3 doubles

  1. syntax = &quot;proto3&quot;;
  2. message TestMessage{
  3. double input = 1;
  4. double output = 2;
  5. double info = 3;
  6. }

When I set these values to

  1. test.set_input(2.3456);
  2. test.set_output(5.4321);
  3. test.set_info(5.0);

the serialized message looks like

  1. 00000000 09 16 fb cb ee c9 c3 02 40 11 0a 68 22 6c 78 ba |........@..h&quot;lx.|
  2. 00000010 15 40 19 |.@.|
  3. 00000013

when using test.serializeToArray and could not be deserialized successfully by a go program using the same protobuf message. When trying to read it from a c++ program I got a 0 as info, so the message seems to be corrupted.

When using test.serializeToOstream I got this message, which could be deserialized successfully by both go and c++ programs.

  1. 00000000 09 16 fb cb ee c9 c3 02 40 11 0a 68 22 6c 78 ba |........@..h&quot;lx.|
  2. 00000010 15 40 19 00 00 00 00 00 00 14 40 |.@........@|
  3. 0000001b

When setting the values to

  1. test.set_input(2.3456);
  2. test.set_output(5.4321);
  3. test.set_info(5.5678);

the serialized messages, both produced by test.serializeToArray and test.serializeToOstream look like

  1. 00000000 09 16 fb cb ee c9 c3 02 40 11 0a 68 22 6c 78 ba |........@..h&quot;lx.|
  2. 00000010 15 40 19 da ac fa 5c 6d 45 16 40 |.@....\mE.@|
  3. 0000001b

and could be successfully read by my go and cpp program.

What am I missing here? Why is serializeToArray not working in the first case?

EDIT:
As it turns out, serializeToString works fine, too.

Here the code I used for the comparison:

  1. file_a.open(FILEPATH_A);
  2. file_b.open(FILEPATH_B);
  3. test.set_input(2.3456);
  4. test.set_output(5.4321);
  5. test.set_info(5.0);
  6. //serializeToArray
  7. int size = test.ByteSize();
  8. char *buffer = (char*) malloc(size);
  9. test.SerializeToArray(buffer, size);
  10. file_a &lt;&lt; buffer;
  11. //serializeToString
  12. std::string buf;
  13. test.SerializeToString(&amp;buf);
  14. file_b &lt;&lt; buf;
  15. file_a.close();
  16. file_b.close();

Why does serializeToArray not work as expected?

EDIT2:

When using file_b &lt;&lt; buf.data() instead of file_b &lt;&lt; buf.data(), the data gets corrupted as well, but why?

答案1

得分: 2

我认为你犯的错误是将二进制视为字符数据并使用字符数据的API。许多这些API在遇到第一个空字节(0)时就会停止,但在protobuf二进制中,这是一个完全有效的值。

你需要确保不使用任何此类API,而是纯粹使用二进制安全的API。

由于你指示size为27,这一切都符合。

基本上,5.0的二进制表示包含了0字节,但你可能在其他时间的值中也会遇到同样的问题。

英文:

I think the error you're making is treating binary as character data and using character data APIs. Many of those APIs stop at the first nil byte (0), but that is a totally valid value in protobuf binary.

You need to make sure you don't use any such APIs basically - stick purely to binary safe APIs.

Since you indicate that size is 27, this all fits.

Basically, the binary representation of 5.0 includes 0 bytes, but you could easily have seen the same problem for other values in time.

huangapple
  • 本文由 发表于 2017年6月28日 23:08:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/44806362.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定