Error while using proto.Unmarshal for Protocol Buffers data: "string field contains invalid UTF-8

huangapple go评论90阅读模式
英文:

Error while using proto.Unmarshal for Protocol Buffers data: "string field contains invalid UTF-8

问题

我正在处理一个使用 Protocol Buffers 的 Go 应用程序,并且在使用 proto.Unmarshal 进行数据反序列化时遇到了问题。我得到的具体错误信息是 "string field contains invalid UTF-8"。

以下是我的反序列化代码:

import (
    "github.com/example/pb" // 导入包含 Protocol Buffers 定义的包。
    "google.golang.org/protobuf/proto"
)

func deserializeData(res_data *pb.ResponseData) (*pb.NotfnMessg, error) {
    notfnMessg := &pb.NotfnMessg{}

    err := proto.Unmarshal(res_data.Payload.GetSerializedMessage(), notfnMessg)
    if err != nil {
        appLogger.Logging().Error("Error Unmarshalling the message: %v", err)
    }
    appLogger.Logging().Info("%v, %v", notfnMessg, res_data.Payload.GetSerializedMessage())
}

res_data.Payload.GetSerializedMessage() 函数以字符串格式返回序列化的消息数据。下面是作为演示的 res_data 的内容。

这是存储在 res_data 中的序列化数据的示例:

msg_hdr:{s_val:"CONFIG_GET_REP"}  payload:{payload_header:{s_val:"SUCCESS"}  serialized_message:"\n\xc3\x0b...<omitted for brevity>.../communication\n*\n\x05title\x12!\n\x17Pi Module ConfigurationZ\x06/title"}

如你所见,序列化的消息包含二进制数据,包括字符串。似乎在其中一个字符串字段中可能存在无效的 UTF-8 字符,导致反序列化失败。

我想知道如何正确处理此问题,并成功地进行数据反序列化,而不会遇到 "string field contains invalid UTF-8" 错误。

非常感谢您对在反序列化过程中如何处理无效的 UTF-8 字符的任何指导或建议。谢谢!

英文:

I'm working on a Go application that uses Protocol Buffers, and I'm encountering an issue while deserializing the data using proto.Unmarshal. The specific error message I'm getting is "string field contains invalid UTF-8".

Here is my deserialization code:

import (
    &quot;github.com/example/pb&quot; // Import the package containing the Protocol Buffers definition.
    &quot;google.golang.org/protobuf/proto&quot;
)

func deserializeData(res_data *pb.ResponseData) (*pb.NotfnMessg, error) {
    notfnMessg := &amp;pb.NotfnMessg{}

    err := proto.Unmarshal(res_data.Payload.GetSerializedMessage(), notfnMessg)
    if err != nil {
        appLogger.Logging().Error(&quot;Error Unmarshalling the message: %v&quot;, err)
    }
    appLogger.Logging().Info(&quot;%v, %v&quot;, notfnMessg, res_data.Payload.GetSerializedMessage())
}

The res_data.Payload.GetSerializedMessage() function returns the serialized message data in a string format. The content of res_data is provided below as a demo.

Here's an example of the serialized data stored in res_data:

msg_hdr:{s_val:&quot;CONFIG_GET_REP&quot;}  payload:{payload_header:{s_val:&quot;SUCCESS&quot;}  serialized_message:&quot;\n\xc3\x0b...&lt;omitted for brevity&gt;.../communication\n*\n\x05title\x12!\n\x17Pi Module ConfigurationZ\x06/title&quot;}

As you can see, the serialized message contains binary data, including strings. It appears that there might be invalid UTF-8 characters in one of the string fields, causing the deserialization to fail.

I would like to know how to properly handle this issue and successfully deserialize the data without encountering the "string field contains invalid UTF-8" error.

Any guidance or suggestions on how to handle invalid UTF-8 characters during the deserialization process would be greatly appreciated. Thank you!

答案1

得分: 1

当对一个string字段进行反序列化时,Go protobuf代码(以及其他)会验证字符串是否为UTF-8编码。如果不是,你会看到你所见到的错误。

据我所知,最简单的解决方法是编辑.proto文件,将string字段改为bytes,然后重新运行protoc(或buf等)来重新生成你的github.com/example/pbstringbytes字段是兼容的,区别在于输出类型。

这将使你在Go中得到一个[]byte类型的数据,然后你可以自行处理它(例如,使用strings.ToValidUTF8(&quot;a\xc5z&quot;, &quot;&quot;),参考这个答案)。

英文:

When deserializing a string field the Go protobuf code (and others) validate that strings are UTF-8. When they are not you get the error you are seeing.

As far as I am aware the simplest way around this is to edit the .proto and change the string fields to bytes, then rerun protoc (or buf etc) to regenerate your github.com/example/pb. string and bytes fields are wire compatible the difference is in the output types.

This will give you the data as a []byte in Go, you can then process it yourself (e.g. strings.ToValidUTF8(&quot;a\xc5z&quot;, &quot;&quot;) as per this answer)

huangapple
  • 本文由 发表于 2023年7月26日 16:21:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76769291.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定