英文:
Error while using proto.Unmarshal for Protocol Buffers data: "string field contains invalid UTF-8
问题
我正在处理一个使用 Protocol Buffers 的 Go 应用程序,并且在使用 proto.Unmarshal
进行数据反序列化时遇到了问题。我得到的具体错误信息是 "string field contains invalid UTF-8"。
以下是我的反序列化代码:
import (
"github.com/example/pb" // 导入包含 Protocol Buffers 定义的包。
"google.golang.org/protobuf/proto"
)
func deserializeData(res_data *pb.ResponseData) (*pb.NotfnMessg, error) {
notfnMessg := &pb.NotfnMessg{}
err := proto.Unmarshal(res_data.Payload.GetSerializedMessage(), notfnMessg)
if err != nil {
appLogger.Logging().Error("Error Unmarshalling the message: %v", err)
}
appLogger.Logging().Info("%v, %v", notfnMessg, res_data.Payload.GetSerializedMessage())
}
res_data.Payload.GetSerializedMessage()
函数以字符串格式返回序列化的消息数据。下面是作为演示的 res_data
的内容。
这是存储在 res_data
中的序列化数据的示例:
msg_hdr:{s_val:"CONFIG_GET_REP"} payload:{payload_header:{s_val:"SUCCESS"} serialized_message:"\n\xc3\x0b...<omitted for brevity>.../communication\n*\n\x05title\x12!\n\x17Pi Module ConfigurationZ\x06/title"}
如你所见,序列化的消息包含二进制数据,包括字符串。似乎在其中一个字符串字段中可能存在无效的 UTF-8 字符,导致反序列化失败。
我想知道如何正确处理此问题,并成功地进行数据反序列化,而不会遇到 "string field contains invalid UTF-8" 错误。
非常感谢您对在反序列化过程中如何处理无效的 UTF-8 字符的任何指导或建议。谢谢!
英文:
I'm working on a Go application that uses Protocol Buffers, and I'm encountering an issue while deserializing the data using proto.Unmarshal
. The specific error message I'm getting is "string field contains invalid UTF-8".
Here is my deserialization code:
import (
"github.com/example/pb" // Import the package containing the Protocol Buffers definition.
"google.golang.org/protobuf/proto"
)
func deserializeData(res_data *pb.ResponseData) (*pb.NotfnMessg, error) {
notfnMessg := &pb.NotfnMessg{}
err := proto.Unmarshal(res_data.Payload.GetSerializedMessage(), notfnMessg)
if err != nil {
appLogger.Logging().Error("Error Unmarshalling the message: %v", err)
}
appLogger.Logging().Info("%v, %v", notfnMessg, res_data.Payload.GetSerializedMessage())
}
The res_data.Payload.GetSerializedMessage()
function returns the serialized message data in a string format. The content of res_data
is provided below as a demo.
Here's an example of the serialized data stored in res_data
:
msg_hdr:{s_val:"CONFIG_GET_REP"} payload:{payload_header:{s_val:"SUCCESS"} serialized_message:"\n\xc3\x0b...<omitted for brevity>.../communication\n*\n\x05title\x12!\n\x17Pi Module ConfigurationZ\x06/title"}
As you can see, the serialized message contains binary data, including strings. It appears that there might be invalid UTF-8 characters in one of the string fields, causing the deserialization to fail.
I would like to know how to properly handle this issue and successfully deserialize the data without encountering the "string field contains invalid UTF-8" error.
Any guidance or suggestions on how to handle invalid UTF-8 characters during the deserialization process would be greatly appreciated. Thank you!
答案1
得分: 1
当对一个string
字段进行反序列化时,Go protobuf代码(以及其他)会验证字符串是否为UTF-8编码。如果不是,你会看到你所见到的错误。
据我所知,最简单的解决方法是编辑.proto
文件,将string
字段改为bytes
,然后重新运行protoc
(或buf
等)来重新生成你的github.com/example/pb
。string
和bytes
字段是兼容的,区别在于输出类型。
这将使你在Go中得到一个[]byte
类型的数据,然后你可以自行处理它(例如,使用strings.ToValidUTF8("a\xc5z", "")
,参考这个答案)。
英文:
When deserializing a string
field the Go protobuf code (and others) validate that strings are UTF-8. When they are not you get the error you are seeing.
As far as I am aware the simplest way around this is to edit the .proto
and change the string
fields to bytes
, then rerun protoc
(or buf
etc) to regenerate your github.com/example/pb
. string
and bytes
fields are wire compatible the difference is in the output types.
This will give you the data as a []byte
in Go, you can then process it yourself (e.g. strings.ToValidUTF8("a\xc5z", "")
as per this answer)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论