如何将 parquet-go 的 []interface{} 转换为结构体切片?

huangapple go评论198阅读模式
英文:

How to convert parquet-go []interface{} to slice of structs?

问题

我正在努力阅读以下代码,该代码读取parquet文件并将其转换为ParquetProduct结构,稍后我将使用该结构从中获取数据。

func (r *clientRepository) read(logg log.Prot, file string, bucket string) error {
    var err error
    fr, err := pars3.NewS3FileReader(context.Background(), bucket, file, r.s3Client.GetSession().Config)
    if err != nil {
        return errs.Wrap(err)
    }
    defer xio.CloseIgnoringErrors(fr)

    pr, err := reader.NewParquetReader(fr, nil, int64(r.cfg.Workers))
    if err != nil {
        return errs.Wrap(err)
    }

    if pr.GetNumRows() == 0 {
        logg.Infof("Skipping %s due to 0 rows", file)
        return nil
    }

    for {
        rows, err := pr.ReadByNumber(r.cfg.RowsToRead)
        if err != nil {
            return errs.Wrap(err)
        }
        if len(rows) <= 0 {
            break
        }

        // 在这里进行Marshal操作
        byteSlice, err := json.Marshal(rows)
        if err != nil {
            return errs.Wrap(err)
        }

        var productRows []ParquetProduct
        // 然后在这里进行Unmarshal操作
        err = json.Unmarshal(byteSlice, &productRows)
        if err != nil {
            return errs.Wrap(err)
        }

        //.....
        // 在这里使用productRows
        //.....

    }
    return nil
}

问题陈述:

我首先进行了Marshal操作,然后进行了Unmarshal操作以获取所需的对象。有没有办法避免这一切。parquet-go库的ReadByNumber函数返回[]interface{},所以有没有办法直接从[]interface{}中获取我的[]ParquetProduct结构?

我正在使用go 1.19。这是我用来读取parquet文件的库-https://github.com/xitongsys/parquet-go

有没有更好、更高效的方法来完成这个任务?

英文:

I am working on reading parquet file as shown below. Below code read parquet file and converts them to ParquetProduct struct which I use it later on to get data out of it.

func (r *clientRepository) read(logg log.Prot, file string, bucket string) error {
	var err error
	fr, err := pars3.NewS3FileReader(context.Background(), bucket, file, r.s3Client.GetSession().Config)
	if err != nil {
		return errs.Wrap(err)
	}
	defer xio.CloseIgnoringErrors(fr)

	pr, err := reader.NewParquetReader(fr, nil, int64(r.cfg.Workers))
	if err != nil {
		return errs.Wrap(err)
	}

	if pr.GetNumRows() == 0 {
		logg.Infof(&quot;Skipping %s due to 0 rows&quot;, file)
		return nil
	}

	for {
		rows, err := pr.ReadByNumber(r.cfg.RowsToRead)
		if err != nil {
			return errs.Wrap(err)
		}
		if len(rows) &lt;= 0 {
			break
		}

        // doing Marshal here first
		byteSlice, err := json.Marshal(rows)
		if err != nil {
			return errs.Wrap(err)
		}

		var productRows []ParquetProduct
        // and then Unmarshal here
		err = json.Unmarshal(byteSlice, &amp;productRows)
		if err != nil {
			return errs.Wrap(err)
		}

        //.....
        // use productRows here
        //.....
    
	}
	return nil
}

Problem Statement

I am doing Marshal first and then Unmarshalling to get the required object. Is there any way to avoid all this. ReadByNumber function (of parquet-go library) returns []interface{} so is there anyway to get my []ParquetProduct struct back just from the []interface{}?

I am using go 1.19. This is the library I am using to read parquet file - https://github.com/xitongsys/parquet-go

Is there any better and efficient way to do this overall?

答案1

得分: 1

不要使用ReadByNumer,而是使用所需长度的[]ParquetProduct切片,并使用Read方法。

products := make([]ParquetProduct, r.cfg.RowsToRead) 
// ^ 切片的长度和容量都等于 r.cfg.RowsToRead
err = pr.Read(&products)
if err != nil {
	// ...
}	
英文:

Instead of using ReadByNumer, make a slice of []ParquetProduct with the desired length and use Read.

products := make([]ParquetProduct, r.cfg.RowsToRead) 
// ^ slice with length and capacity equal to r.cfg.RowsToRead
err = pr.Read(&amp;products)
if err != nil {
	// ...
}	

huangapple
  • 本文由 发表于 2022年10月21日 03:43:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/74145534.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定