如何将 parquet-go 的 []interface{} 转换为结构体切片?

huangapple go评论87阅读模式
英文:

How to convert parquet-go []interface{} to slice of structs?

问题

我正在努力阅读以下代码,该代码读取parquet文件并将其转换为ParquetProduct结构,稍后我将使用该结构从中获取数据。

func (r *clientRepository) read(logg log.Prot, file string, bucket string) error {
    var err error
    fr, err := pars3.NewS3FileReader(context.Background(), bucket, file, r.s3Client.GetSession().Config)
    if err != nil {
        return errs.Wrap(err)
    }
    defer xio.CloseIgnoringErrors(fr)

    pr, err := reader.NewParquetReader(fr, nil, int64(r.cfg.Workers))
    if err != nil {
        return errs.Wrap(err)
    }

    if pr.GetNumRows() == 0 {
        logg.Infof("Skipping %s due to 0 rows", file)
        return nil
    }

    for {
        rows, err := pr.ReadByNumber(r.cfg.RowsToRead)
        if err != nil {
            return errs.Wrap(err)
        }
        if len(rows) <= 0 {
            break
        }

        // 在这里进行Marshal操作
        byteSlice, err := json.Marshal(rows)
        if err != nil {
            return errs.Wrap(err)
        }

        var productRows []ParquetProduct
        // 然后在这里进行Unmarshal操作
        err = json.Unmarshal(byteSlice, &productRows)
        if err != nil {
            return errs.Wrap(err)
        }

        //.....
        // 在这里使用productRows
        //.....

    }
    return nil
}

问题陈述:

我首先进行了Marshal操作,然后进行了Unmarshal操作以获取所需的对象。有没有办法避免这一切。parquet-go库的ReadByNumber函数返回[]interface{},所以有没有办法直接从[]interface{}中获取我的[]ParquetProduct结构?

我正在使用go 1.19。这是我用来读取parquet文件的库-https://github.com/xitongsys/parquet-go

有没有更好、更高效的方法来完成这个任务?

英文:

I am working on reading parquet file as shown below. Below code read parquet file and converts them to ParquetProduct struct which I use it later on to get data out of it.

func (r *clientRepository) read(logg log.Prot, file string, bucket string) error {
var err error
fr, err := pars3.NewS3FileReader(context.Background(), bucket, file, r.s3Client.GetSession().Config)
if err != nil {
return errs.Wrap(err)
}
defer xio.CloseIgnoringErrors(fr)
pr, err := reader.NewParquetReader(fr, nil, int64(r.cfg.Workers))
if err != nil {
return errs.Wrap(err)
}
if pr.GetNumRows() == 0 {
logg.Infof(&quot;Skipping %s due to 0 rows&quot;, file)
return nil
}
for {
rows, err := pr.ReadByNumber(r.cfg.RowsToRead)
if err != nil {
return errs.Wrap(err)
}
if len(rows) &lt;= 0 {
break
}
// doing Marshal here first
byteSlice, err := json.Marshal(rows)
if err != nil {
return errs.Wrap(err)
}
var productRows []ParquetProduct
// and then Unmarshal here
err = json.Unmarshal(byteSlice, &amp;productRows)
if err != nil {
return errs.Wrap(err)
}
//.....
// use productRows here
//.....
}
return nil
}

Problem Statement

I am doing Marshal first and then Unmarshalling to get the required object. Is there any way to avoid all this. ReadByNumber function (of parquet-go library) returns []interface{} so is there anyway to get my []ParquetProduct struct back just from the []interface{}?

I am using go 1.19. This is the library I am using to read parquet file - https://github.com/xitongsys/parquet-go

Is there any better and efficient way to do this overall?

答案1

得分: 1

不要使用ReadByNumer,而是使用所需长度的[]ParquetProduct切片,并使用Read方法。

products := make([]ParquetProduct, r.cfg.RowsToRead) 
// ^ 切片的长度和容量都等于 r.cfg.RowsToRead
err = pr.Read(&products)
if err != nil {
	// ...
}	
英文:

Instead of using ReadByNumer, make a slice of []ParquetProduct with the desired length and use Read.

products := make([]ParquetProduct, r.cfg.RowsToRead) 
// ^ slice with length and capacity equal to r.cfg.RowsToRead
err = pr.Read(&amp;products)
if err != nil {
// ...
}	

huangapple
  • 本文由 发表于 2022年10月21日 03:43:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/74145534.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定