Error while using proto.Unmarshal for Protocol Buffers data: "string field contains invalid UTF-8

huangapple go评论132阅读模式
英文:

Error while using proto.Unmarshal for Protocol Buffers data: "string field contains invalid UTF-8

问题

我正在处理一个使用 Protocol Buffers 的 Go 应用程序,并且在使用 proto.Unmarshal 进行数据反序列化时遇到了问题。我得到的具体错误信息是 "string field contains invalid UTF-8"。

以下是我的反序列化代码:

  1. import (
  2. "github.com/example/pb" // 导入包含 Protocol Buffers 定义的包。
  3. "google.golang.org/protobuf/proto"
  4. )
  5. func deserializeData(res_data *pb.ResponseData) (*pb.NotfnMessg, error) {
  6. notfnMessg := &pb.NotfnMessg{}
  7. err := proto.Unmarshal(res_data.Payload.GetSerializedMessage(), notfnMessg)
  8. if err != nil {
  9. appLogger.Logging().Error("Error Unmarshalling the message: %v", err)
  10. }
  11. appLogger.Logging().Info("%v, %v", notfnMessg, res_data.Payload.GetSerializedMessage())
  12. }

res_data.Payload.GetSerializedMessage() 函数以字符串格式返回序列化的消息数据。下面是作为演示的 res_data 的内容。

这是存储在 res_data 中的序列化数据的示例:

  1. msg_hdr:{s_val:"CONFIG_GET_REP"} payload:{payload_header:{s_val:"SUCCESS"} serialized_message:"\n\xc3\x0b...<omitted for brevity>.../communication\n*\n\x05title\x12!\n\x17Pi Module ConfigurationZ\x06/title"}

如你所见,序列化的消息包含二进制数据,包括字符串。似乎在其中一个字符串字段中可能存在无效的 UTF-8 字符,导致反序列化失败。

我想知道如何正确处理此问题,并成功地进行数据反序列化,而不会遇到 "string field contains invalid UTF-8" 错误。

非常感谢您对在反序列化过程中如何处理无效的 UTF-8 字符的任何指导或建议。谢谢!

英文:

I'm working on a Go application that uses Protocol Buffers, and I'm encountering an issue while deserializing the data using proto.Unmarshal. The specific error message I'm getting is "string field contains invalid UTF-8".

Here is my deserialization code:

  1. import (
  2. &quot;github.com/example/pb&quot; // Import the package containing the Protocol Buffers definition.
  3. &quot;google.golang.org/protobuf/proto&quot;
  4. )
  5. func deserializeData(res_data *pb.ResponseData) (*pb.NotfnMessg, error) {
  6. notfnMessg := &amp;pb.NotfnMessg{}
  7. err := proto.Unmarshal(res_data.Payload.GetSerializedMessage(), notfnMessg)
  8. if err != nil {
  9. appLogger.Logging().Error(&quot;Error Unmarshalling the message: %v&quot;, err)
  10. }
  11. appLogger.Logging().Info(&quot;%v, %v&quot;, notfnMessg, res_data.Payload.GetSerializedMessage())
  12. }

The res_data.Payload.GetSerializedMessage() function returns the serialized message data in a string format. The content of res_data is provided below as a demo.

Here's an example of the serialized data stored in res_data:

  1. msg_hdr:{s_val:&quot;CONFIG_GET_REP&quot;} payload:{payload_header:{s_val:&quot;SUCCESS&quot;} serialized_message:&quot;\n\xc3\x0b...&lt;omitted for brevity&gt;.../communication\n*\n\x05title\x12!\n\x17Pi Module ConfigurationZ\x06/title&quot;}

As you can see, the serialized message contains binary data, including strings. It appears that there might be invalid UTF-8 characters in one of the string fields, causing the deserialization to fail.

I would like to know how to properly handle this issue and successfully deserialize the data without encountering the "string field contains invalid UTF-8" error.

Any guidance or suggestions on how to handle invalid UTF-8 characters during the deserialization process would be greatly appreciated. Thank you!

答案1

得分: 1

当对一个string字段进行反序列化时,Go protobuf代码(以及其他)会验证字符串是否为UTF-8编码。如果不是,你会看到你所见到的错误。

据我所知,最简单的解决方法是编辑.proto文件,将string字段改为bytes,然后重新运行protoc(或buf等)来重新生成你的github.com/example/pbstringbytes字段是兼容的,区别在于输出类型。

这将使你在Go中得到一个[]byte类型的数据,然后你可以自行处理它(例如,使用strings.ToValidUTF8(&quot;a\xc5z&quot;, &quot;&quot;),参考这个答案)。

英文:

When deserializing a string field the Go protobuf code (and others) validate that strings are UTF-8. When they are not you get the error you are seeing.

As far as I am aware the simplest way around this is to edit the .proto and change the string fields to bytes, then rerun protoc (or buf etc) to regenerate your github.com/example/pb. string and bytes fields are wire compatible the difference is in the output types.

This will give you the data as a []byte in Go, you can then process it yourself (e.g. strings.ToValidUTF8(&quot;a\xc5z&quot;, &quot;&quot;) as per this answer)

huangapple
  • 本文由 发表于 2023年7月26日 16:21:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76769291.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定