英文:
How to convert parquet-go []interface{} to slice of structs?
问题
我正在努力阅读以下代码,该代码读取parquet文件并将其转换为ParquetProduct
结构,稍后我将使用该结构从中获取数据。
func (r *clientRepository) read(logg log.Prot, file string, bucket string) error {
var err error
fr, err := pars3.NewS3FileReader(context.Background(), bucket, file, r.s3Client.GetSession().Config)
if err != nil {
return errs.Wrap(err)
}
defer xio.CloseIgnoringErrors(fr)
pr, err := reader.NewParquetReader(fr, nil, int64(r.cfg.Workers))
if err != nil {
return errs.Wrap(err)
}
if pr.GetNumRows() == 0 {
logg.Infof("Skipping %s due to 0 rows", file)
return nil
}
for {
rows, err := pr.ReadByNumber(r.cfg.RowsToRead)
if err != nil {
return errs.Wrap(err)
}
if len(rows) <= 0 {
break
}
// 在这里进行Marshal操作
byteSlice, err := json.Marshal(rows)
if err != nil {
return errs.Wrap(err)
}
var productRows []ParquetProduct
// 然后在这里进行Unmarshal操作
err = json.Unmarshal(byteSlice, &productRows)
if err != nil {
return errs.Wrap(err)
}
//.....
// 在这里使用productRows
//.....
}
return nil
}
问题陈述:
我首先进行了Marshal
操作,然后进行了Unmarshal
操作以获取所需的对象。有没有办法避免这一切。parquet-go
库的ReadByNumber
函数返回[]interface{}
,所以有没有办法直接从[]interface{}
中获取我的[]ParquetProduct
结构?
我正在使用go 1.19
。这是我用来读取parquet
文件的库-https://github.com/xitongsys/parquet-go
有没有更好、更高效的方法来完成这个任务?
英文:
I am working on reading parquet file as shown below. Below code read parquet file and converts them to ParquetProduct
struct which I use it later on to get data out of it.
func (r *clientRepository) read(logg log.Prot, file string, bucket string) error {
var err error
fr, err := pars3.NewS3FileReader(context.Background(), bucket, file, r.s3Client.GetSession().Config)
if err != nil {
return errs.Wrap(err)
}
defer xio.CloseIgnoringErrors(fr)
pr, err := reader.NewParquetReader(fr, nil, int64(r.cfg.Workers))
if err != nil {
return errs.Wrap(err)
}
if pr.GetNumRows() == 0 {
logg.Infof("Skipping %s due to 0 rows", file)
return nil
}
for {
rows, err := pr.ReadByNumber(r.cfg.RowsToRead)
if err != nil {
return errs.Wrap(err)
}
if len(rows) <= 0 {
break
}
// doing Marshal here first
byteSlice, err := json.Marshal(rows)
if err != nil {
return errs.Wrap(err)
}
var productRows []ParquetProduct
// and then Unmarshal here
err = json.Unmarshal(byteSlice, &productRows)
if err != nil {
return errs.Wrap(err)
}
//.....
// use productRows here
//.....
}
return nil
}
Problem Statement
I am doing Marshal
first and then Unmarshalling
to get the required object. Is there any way to avoid all this. ReadByNumber
function (of parquet-go
library) returns []interface{}
so is there anyway to get my []ParquetProduct
struct back just from the []interface{}
?
I am using go 1.19
. This is the library I am using to read parquet
file - https://github.com/xitongsys/parquet-go
Is there any better and efficient way to do this overall?
答案1
得分: 1
不要使用ReadByNumer
,而是使用所需长度的[]ParquetProduct
切片,并使用Read
方法。
products := make([]ParquetProduct, r.cfg.RowsToRead)
// ^ 切片的长度和容量都等于 r.cfg.RowsToRead
err = pr.Read(&products)
if err != nil {
// ...
}
英文:
Instead of using ReadByNumer
, make a slice of []ParquetProduct
with the desired length and use Read
.
products := make([]ParquetProduct, r.cfg.RowsToRead)
// ^ slice with length and capacity equal to r.cfg.RowsToRead
err = pr.Read(&products)
if err != nil {
// ...
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论