2023年7月26日 16:21:35go评论144阅读模式

英文:

Error while using proto.Unmarshal for Protocol Buffers data: "string field contains invalid UTF-8

问题

我正在处理一个使用 Protocol Buffers 的 Go 应用程序，并且在使用 proto.Unmarshal 进行数据反序列化时遇到了问题。我得到的具体错误信息是 "string field contains invalid UTF-8"。

以下是我的反序列化代码：

import (
    "github.com/example/pb" // 导入包含 Protocol Buffers 定义的包。
    "google.golang.org/protobuf/proto"
)
func deserializeData(res_data *pb.ResponseData) (*pb.NotfnMessg, error) {
    notfnMessg := &pb.NotfnMessg{}
    err := proto.Unmarshal(res_data.Payload.GetSerializedMessage(), notfnMessg)
    if err != nil {
        appLogger.Logging().Error("Error Unmarshalling the message: %v", err)
    }
    appLogger.Logging().Info("%v, %v", notfnMessg, res_data.Payload.GetSerializedMessage())
}

res_data.Payload.GetSerializedMessage() 函数以字符串格式返回序列化的消息数据。下面是作为演示的 res_data 的内容。

这是存储在 res_data 中的序列化数据的示例：

msg_hdr:{s_val:"CONFIG_GET_REP"}  payload:{payload_header:{s_val:"SUCCESS"}  serialized_message:"\n\xc3\x0b...<omitted for brevity>.../communication\n*\n\x05title\x12!\n\x17Pi Module ConfigurationZ\x06/title"}

如你所见，序列化的消息包含二进制数据，包括字符串。似乎在其中一个字符串字段中可能存在无效的 UTF-8 字符，导致反序列化失败。

我想知道如何正确处理此问题，并成功地进行数据反序列化，而不会遇到 "string field contains invalid UTF-8" 错误。

非常感谢您对在反序列化过程中如何处理无效的 UTF-8 字符的任何指导或建议。谢谢！

英文:

I'm working on a Go application that uses Protocol Buffers, and I'm encountering an issue while deserializing the data using proto.Unmarshal. The specific error message I'm getting is "string field contains invalid UTF-8".

Here is my deserialization code:

import (
    &quot;github.com/example/pb&quot; // Import the package containing the Protocol Buffers definition.
    &quot;google.golang.org/protobuf/proto&quot;
)
func deserializeData(res_data *pb.ResponseData) (*pb.NotfnMessg, error) {
    notfnMessg := &amp;pb.NotfnMessg{}
    err := proto.Unmarshal(res_data.Payload.GetSerializedMessage(), notfnMessg)
    if err != nil {
        appLogger.Logging().Error(&quot;Error Unmarshalling the message: %v&quot;, err)
    }
    appLogger.Logging().Info(&quot;%v, %v&quot;, notfnMessg, res_data.Payload.GetSerializedMessage())
}

The res_data.Payload.GetSerializedMessage() function returns the serialized message data in a string format. The content of res_data is provided below as a demo.

Here's an example of the serialized data stored in res_data:

msg_hdr:{s_val:&quot;CONFIG_GET_REP&quot;}  payload:{payload_header:{s_val:&quot;SUCCESS&quot;}  serialized_message:&quot;\n\xc3\x0b...&lt;omitted for brevity&gt;.../communication\n*\n\x05title\x12!\n\x17Pi Module ConfigurationZ\x06/title&quot;}

As you can see, the serialized message contains binary data, including strings. It appears that there might be invalid UTF-8 characters in one of the string fields, causing the deserialization to fail.

I would like to know how to properly handle this issue and successfully deserialize the data without encountering the "string field contains invalid UTF-8" error.

Any guidance or suggestions on how to handle invalid UTF-8 characters during the deserialization process would be greatly appreciated. Thank you!

答案1

得分: 1

当对一个string字段进行反序列化时，Go protobuf代码（以及其他）会验证字符串是否为UTF-8编码。如果不是，你会看到你所见到的错误。

据我所知，最简单的解决方法是编辑.proto文件，将string字段改为bytes，然后重新运行protoc（或buf等）来重新生成你的github.com/example/pb。string和bytes字段是兼容的，区别在于输出类型。

这将使你在Go中得到一个[]byte类型的数据，然后你可以自行处理它（例如，使用strings.ToValidUTF8("a\xc5z", "")，参考这个答案）。

英文:

When deserializing a string field the Go protobuf code (and others) validate that strings are UTF-8. When they are not you get the error you are seeing.

As far as I am aware the simplest way around this is to edit the .proto and change the string fields to bytes, then rerun protoc (or buf etc) to regenerate your github.com/example/pb. string and bytes fields are wire compatible the difference is in the output types.

This will give you the data as a []byte in Go, you can then process it yourself (e.g. strings.ToValidUTF8("a\xc5z", "") as per this answer)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Error while using proto.Unmarshal for Protocol Buffers data: "string field contains invalid UTF-8

问题

答案1

在Golang中的游戏循环模拟

在非缓冲通道上发生死锁

如何转换HTML标签中的转义字符？

使用Nginx部署多个Go应用程序

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。