将Parquet文件转换为具有嵌套元素的Golang结构体。

huangapple go评论78阅读模式
英文:

Converting parquet file to Golang struct with nested elements

问题

我正在尝试使用xitongsys/parquet-go库在Go中读取一个包含嵌套数组/结构的parquet文件。列表数据没有被读取,也没有看到值。以下是我在Golang中的结构体:

type Play struct {
    SID            string   `parquet:"name=si, type=BYTE_ARRAY, convertedtype=UTF8, encoding=PLAIN_DICTIONARY, repetitiontype=OPTIONAL" json:"si,omitempty"`
    TimeStamp      int      `parquet:"name=ts, type=INT64, repetitiontype=OPTIONAL" json:"ts,omitempty"`
    SingleID       int      `parquet:"name=sg, type=INT64, repetitiontype=OPTIONAL" json:"sg,omitempty"`
    PID            int      `parquet:"name=playid, type=INT64, repetitiontype=OPTIONAL" json:"playid,omitempty"`
    StartTimeStamp string   `parquet:"name=startts, type=BYTE_ARRAY,repetitiontype=OPTIONAL"`
    Price          []Price1 `parquet:"name=price, type=LIST, repetitiontype=REQUIRED" json:"price,omitempty"`
}

type Price1 struct {
    CurrID int    `parquet:"name=currId, type=INT64, repetitiontype=REQUIRED" json:"currId,omitempty"`
    LPTag  string `parquet:"name=lptag, type=BYTE_ARRAY,convertedtype=UTF8, repetitiontype=REQUIRED" json:"lptag,omitempty"`
    LPrice Money  `parquet:"name=lpmoney, type=STRUCT" json:"lpmoney,omitempty"`
}

type Money struct {
    AdmCurrCode  string `parquet:"name=admCC, type=BYTE_ARRAY, repetitiontype=OPTIONAL" json:"admCC,omitempty"`
    AdmCurrValue string `parquet:"name=admCV, type=BYTE_ARRAY" json:"admCV,omitempty"`
}

即使parquet文件中有有效值,CurrID和LPTag也为空。

英文:

I am trying to read a parquet file with nested arrays/structs in Go using xitongsys/parquet-go library. The list data is not getting read and not seeing the values. Below is my struct in Golang

type Play struct {
	SID            string   `parquet:"name=si, type=BYTE_ARRAY, convertedtype=UTF8, encoding=PLAIN_DICTIONARY, repetitiontype=OPTIONAL" json:"si,omitempty"`
	TimeStamp      int      `parquet:"name=ts, type=INT64, repetitiontype=OPTIONAL" json:"ts,omitempty"`
	SingleID       int      `parquet:"name=sg, type=INT64, repetitiontype=OPTIONAL" json:"sg,omitempty"`
	PID            int      `parquet:"name=playid, type=INT64, repetitiontype=OPTIONAL" json:"playid,omitempty"`
	StartTimeStamp string   `parquet:"name=startts, type=BYTE_ARRAY,repetitiontype=OPTIONAL"`
	Price          []Price1 `parquet:"name=price, type=LIST, repetitiontype=REQUIRED" json:"price,omitempty"`
}

type Price1 struct {
	CurrID int    `parquet:"name=currId, type=INT64, repetitiontype=REQUIRED" json:"currId,omitempty"`
	LPTag  string `parquet:"name=lptag, type=BYTE_ARRAY,convertedtype=UTF8, repetitiontype=REQUIRED" json:"lptag,omitempty"`
	LPrice Money  `parquet:"name=lpmoney, type=STRUCT" json:"lpmoney,omitempty"`
}

type Money struct {
	AdmCurrCode  string `parquet:"name=admCC, type=BYTE_ARRAY, repetitiontype=OPTIONAL" json:"admCC,omitempty"`
	AdmCurrValue string `parquet:"name=admCV, type=BYTE_ARRAY" json:"admCV,omitempty"`
}

CurrID and LPTag are coming as empty even though the parquet file is having valid values

答案1

得分: 1

我发现github.com/segmentio/parquet-go包可以正确读取文件。你是否需要坚持使用github.com/xitongsys/parquet-go包?

package main

import (
	"fmt"

	"github.com/segmentio/parquet-go"
)

type Play struct {
	SID            string  `parquet:"si"`
	TimeStamp      int     `parquet:"ts"`
	SingleID       int     `parquet:"sg"`
	PID            int     `parquet:"playid"`
	StartTimeStamp string  `parquet:"startts"`
	Price          []Price `parquet:"price,list"`
}

type Price struct {
	CurrID int    `parquet:"currId"`
	LPTag  string `parquet:"lptag"`
	LPrice Money  `parquet:"lpmoney"`
}

type Money struct {
	AdmCurrCode  string `parquet:"admCC"`
	AdmCurrValue string `parquet:"admCV"`
}

func main() {
	rows, err := parquet.ReadFile[Play]("s3.parquet")
	if err != nil {
		panic(err)
	}

	for _, c := range rows {
		fmt.Printf("%+v\n", c)
	}
}
英文:

I found that the github.com/segmentio/parquet-go package can read the file correctly. Do you need to stick to the github.com/xitongsys/parquet-go package?

package main

import (
	"fmt"

	"github.com/segmentio/parquet-go"
)

type Play struct {
	SID            string  `parquet:"si"`
	TimeStamp      int     `parquet:"ts"`
	SingleID       int     `parquet:"sg"`
	PID            int     `parquet:"playid"`
	StartTimeStamp string  `parquet:"startts"`
	Price          []Price `parquet:"price,list"`
}

type Price struct {
	CurrID int    `parquet:"currId"`
	LPTag  string `parquet:"lptag"`
	LPrice Money  `parquet:"lpmoney"`
}

type Money struct {
	AdmCurrCode  string `parquet:"admCC"`
	AdmCurrValue string `parquet:"admCV"`
}

func main() {
	rows, err := parquet.ReadFile[Play]("s3.parquet")
	if err != nil {
		panic(err)
	}

	for _, c := range rows {
		fmt.Printf("%+v\n", c)
	}
}

huangapple
  • 本文由 发表于 2023年6月10日 16:19:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76445364.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定