Error while using proto.Unmarshal for Protocol Buffers data: "string field contains invalid UTF-8

huangapple go评论132阅读模式

Error while using proto.Unmarshal for Protocol Buffers data: "string field contains invalid UTF-8


我正在处理一个使用 Protocol Buffers 的 Go 应用程序,并且在使用 proto.Unmarshal 进行数据反序列化时遇到了问题。我得到的具体错误信息是 "string field contains invalid UTF-8"。


  1. import (
  2. "" // 导入包含 Protocol Buffers 定义的包。
  3. ""
  4. )
  5. func deserializeData(res_data *pb.ResponseData) (*pb.NotfnMessg, error) {
  6. notfnMessg := &pb.NotfnMessg{}
  7. err := proto.Unmarshal(res_data.Payload.GetSerializedMessage(), notfnMessg)
  8. if err != nil {
  9. appLogger.Logging().Error("Error Unmarshalling the message: %v", err)
  10. }
  11. appLogger.Logging().Info("%v, %v", notfnMessg, res_data.Payload.GetSerializedMessage())
  12. }

res_data.Payload.GetSerializedMessage() 函数以字符串格式返回序列化的消息数据。下面是作为演示的 res_data 的内容。

这是存储在 res_data 中的序列化数据的示例:

  1. msg_hdr:{s_val:"CONFIG_GET_REP"} payload:{payload_header:{s_val:"SUCCESS"} serialized_message:"\n\xc3\x0b...<omitted for brevity>.../communication\n*\n\x05title\x12!\n\x17Pi Module ConfigurationZ\x06/title"}

如你所见,序列化的消息包含二进制数据,包括字符串。似乎在其中一个字符串字段中可能存在无效的 UTF-8 字符,导致反序列化失败。

我想知道如何正确处理此问题,并成功地进行数据反序列化,而不会遇到 "string field contains invalid UTF-8" 错误。

非常感谢您对在反序列化过程中如何处理无效的 UTF-8 字符的任何指导或建议。谢谢!


I'm working on a Go application that uses Protocol Buffers, and I'm encountering an issue while deserializing the data using proto.Unmarshal. The specific error message I'm getting is "string field contains invalid UTF-8".

Here is my deserialization code:

  1. import (
  2. &quot;; // Import the package containing the Protocol Buffers definition.
  3. &quot;;
  4. )
  5. func deserializeData(res_data *pb.ResponseData) (*pb.NotfnMessg, error) {
  6. notfnMessg := &amp;pb.NotfnMessg{}
  7. err := proto.Unmarshal(res_data.Payload.GetSerializedMessage(), notfnMessg)
  8. if err != nil {
  9. appLogger.Logging().Error(&quot;Error Unmarshalling the message: %v&quot;, err)
  10. }
  11. appLogger.Logging().Info(&quot;%v, %v&quot;, notfnMessg, res_data.Payload.GetSerializedMessage())
  12. }

The res_data.Payload.GetSerializedMessage() function returns the serialized message data in a string format. The content of res_data is provided below as a demo.

Here's an example of the serialized data stored in res_data:

  1. msg_hdr:{s_val:&quot;CONFIG_GET_REP&quot;} payload:{payload_header:{s_val:&quot;SUCCESS&quot;} serialized_message:&quot;\n\xc3\x0b...&lt;omitted for brevity&gt;.../communication\n*\n\x05title\x12!\n\x17Pi Module ConfigurationZ\x06/title&quot;}

As you can see, the serialized message contains binary data, including strings. It appears that there might be invalid UTF-8 characters in one of the string fields, causing the deserialization to fail.

I would like to know how to properly handle this issue and successfully deserialize the data without encountering the "string field contains invalid UTF-8" error.

Any guidance or suggestions on how to handle invalid UTF-8 characters during the deserialization process would be greatly appreciated. Thank you!


得分: 1

当对一个string字段进行反序列化时,Go protobuf代码(以及其他)会验证字符串是否为UTF-8编码。如果不是,你会看到你所见到的错误。


这将使你在Go中得到一个[]byte类型的数据,然后你可以自行处理它(例如,使用strings.ToValidUTF8(&quot;a\xc5z&quot;, &quot;&quot;),参考这个答案)。


When deserializing a string field the Go protobuf code (and others) validate that strings are UTF-8. When they are not you get the error you are seeing.

As far as I am aware the simplest way around this is to edit the .proto and change the string fields to bytes, then rerun protoc (or buf etc) to regenerate your string and bytes fields are wire compatible the difference is in the output types.

This will give you the data as a []byte in Go, you can then process it yourself (e.g. strings.ToValidUTF8(&quot;a\xc5z&quot;, &quot;&quot;) as per this answer)

  • 本文由 发表于 2023年7月26日 16:21:35
  • 转载请务必保留本文链接:



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
