从文件中读取原始字节数据并将其解码为protobuf结构体。

huangapple go评论76阅读模式
英文:

Reading raw byte data from a file and decoding it to a protobuf structs

问题

我在这里尝试做的是:我有一个来自Kafka流的转储,其中以二进制格式存储了未知数量的photobuff记录。我想解码它们,并以JSON格式逐个打印到控制台上。
我在互联网上搜索了很多,但似乎没有明确的答案可以解析包含未知数量photobuff记录的原始二进制文件的数据。
我找到了这个链接:https://stackoverflow.com/questions/35049657/how-to-decode-binary-raw-google-protobuf-data,但它只涉及到解码一个已知记录的简单方法。

我尝试了以下代码,但似乎我不完全理解如何使用proto.buffer.go结构,因为我只能看到所有26 kb数据中的第一个值。

package main

import (
	"encoding/json"
	"fmt"
	"github.com/golang/protobuf/proto"
	"io/ioutil"
	"parseRawDHCP/pb"
)

func main() {
	file, err := ioutil.ReadFile("file")
	if err != nil {
		fmt.Printf("无法读取文件 %v", err)
	}
	msg := pb.Msg{}
	buffer := proto.NewBuffer(file)
	for {
		err := buffer.DecodeMessage(&msg)
		if err != nil {
			panic("无法解码消息")
		}
		marshalledStruct, err := json.Marshal(msg)
		if err != nil {
			panic("无法从消息中编组数据")
		}
		if err == nil {
			fmt.Printf("消息是:%v", marshalledStruct)
			continue
		}
	}
}

如果有人能指导我如何正确解码原始二进制数据为protobuffs,我将非常感激。

英文:

What I'm trying to do here: I have a dump from the Kafka stream with an unknown amount of photobuff records stored there in binary format. I want to decode them and print them one by one to console in JSON format.
I have looked all over the internet but seems that there is no clear answer on reading data from the raw binary file with an unknown amount of photobuff records inside of it.
I found this one: https://stackoverflow.com/questions/35049657/how-to-decode-binary-raw-google-protobuf-data but it is related to the simple decoding of one known record with protoc

I've tried the following, but I seem to do not understand fully how to work with proto.buffer.go struct, since I can only see the first value, out of all the 26 kb data.

package main

import (
	"encoding/json"
	"fmt"
	"github.com/golang/protobuf/proto"
	"io/ioutil"
	"parseRawDHCP/pb"
)

func main() {
	file, err := ioutil.ReadFile("file")
	if err != nil {
		fmt.Printf("unable to read file %v", err)
	}
	msg := pb.Msg{}
	buffer := proto.NewBuffer(file)
	for {
		err := buffer.DecodeMessage(&msg)
		if err != nil {
			panic("unable to decode message")
		}
		marshalledStruct, err := json.Marshal(msg)
		if err != nil {
			panic("can't marshalledStruct the data from message")
		}
		if err == nil {
			fmt.Printf("message is: %v", marshalledStruct)
			continue
		}
	}
}

If someone can point me in a direction on how to correctly decode raw binary into protobuffs I would greatly appreciate it.

答案1

得分: 1

一个原型消息本身没有长度和消息结束的指示。

如果你的文件包含了串行化的原型消息,那么就没有办法单独解码它们。试图将多个消息解码为单个消息将会将所有内容解码为一个结构体,并在解码过程中覆盖每个字段。

如果你的文件包含了长度前缀的消息(参见buffer.EncodeMessage),那么你的示例代码应该能够解码它们(并在EOF时引发panic)。但我怀疑它们是否以这种方式进行了序列化。

英文:

A proto message by itself comes with no length and no end-of-message indication.

If your file contains marshalled proto messages all jammed together, then there's no way to decode them individually. An attempt to decode multiple messages as a single one will decode everything into a single struct, overwriting every field as it proceeds.

If your file contains length-prefixed messages (see buffer.EncodeMessage), then your sample code should be able to decode them (and panic at EOF). But I doubt that they were serialized that way.

huangapple
  • 本文由 发表于 2021年9月28日 23:43:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/69364731.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定