将 Athena 输出转换为 Go 中的强类型模型

huangapple go评论96阅读模式
英文:

Converting Athena output to strongly typed model in Go

问题

有没有一种简单的方法可以将athena.GetQueryResultsOutput结构体转换为Go中的用户定义结构体?

这是我想要实现的一部分代码:

const testSql = `
select
	id,
	name,
	count(source_id) as aggregate_count,
	array_agg(source_id) as aggregate_source_ids,
	array_agg(source_name) as aggregate_source_names
from my_glue_catalog_table
group by id, name
`

type myModel struct {
	id                   int64
	name                 string
	aggregateCount       int
	aggregateSourceIDs   []int64
	aggregateSourceNames []string
}

queryResultOutput, err = awsAthenaClient.GetQueryResults(ctx, &queryResultInput)
// var mapped []myModel = mapper.FromGetQueryResultsOutput(queryResultOutput.ResultSet)

这是queryResultOutput.ResultSet的结构:

ResultSet: {
    ResultSetMetadata: {
      ColumnInfo: [
        {
          CaseSensitive: false,
          CatalogName: "hive",
          Label: "id",
          Name: "id",
          Nullable: "UNKNOWN",
          Precision: 10,
          Scale: 0,
          SchemaName: "",
          TableName: "",
          Type: "integer"
        },
        {
          CaseSensitive: true,
          CatalogName: "hive",
          Label: "name",
          Name: "name",
          Nullable: "UNKNOWN",
          Precision: 2147483647,
          Scale: 0,
          SchemaName: "",
          TableName: "",
          Type: "varchar"
        },
        {
          CaseSensitive: false,
          CatalogName: "hive",
          Label: "aggregate_count",
          Name: "aggregate_count",
          Nullable: "UNKNOWN",
          Precision: 19,
          Scale: 0,
          SchemaName: "",
          TableName: "",
          Type: "bigint"
        },
        {
          CaseSensitive: false,
          CatalogName: "hive",
          Label: "aggregate_source_ids",
          Name: "aggregate_source_ids",
          Nullable: "UNKNOWN",
          Precision: 0,
          Scale: 0,
          SchemaName: "",
          TableName: "",
          Type: "array"
        },
        {
          CaseSensitive: false,
          CatalogName: "hive",
          Label: "aggregate_source_names",
          Name: "aggregate_source_names",
          Nullable: "UNKNOWN",
          Precision: 0,
          Scale: 0,
          SchemaName: "",
          TableName: "",
          Type: "array"
        }
      ]
    },
    Rows: [{
        // 第一行数据(来自结果的第一页)是标题,我们可以忽略它
        Data: [
            VarCharValue: "id" },
            VarCharValue: "name" },
            VarCharValue: "aggregate_count" },
            VarCharValue: "aggregate_source_ids" },
            VarCharValue: "aggregate_source_names" }
        ]
      },
      // 所有后续行都是数据值:
      {
        Data: [
            VarCharValue: "920000" },
            VarCharValue: "mydata1" },
            VarCharValue: "2" },
            VarCharValue: "[52800, 113000]" },
            VarCharValue: "[sourcedata1, sourcedata2]" }
        ]
      }
    ]
}

我写了一个完整的代码示例:gist

如果没有更好的替代方案,我打算编写一个转换器包,类似下面的代码。主要的问题是可靠地转换数组,因为数组中的一个字符串值可能包含逗号。而且我相信我们只能返回[]interface{},因为Go不支持泛型(截至今天)。

// myModel定义了与上面的testSql对应的模式
type myModel struct {
	id                   int      `athenaconv:"id"`
	name                 string   `athenaconv:"name"`
	aggregateCount       int64    `athenaconv:"aggregate_count"`
	aggregateSourceIDs   []int64  `athenaconv:"aggregate_source_ids"`
	aggregateSourceNames []string `athenaconv:"aggregate_source_names"`
}
mapper := athenaconv.MapperFor(reflect.TypeOf(myModel))
var mapped []interface{} = mapper.FromAthenaResultSet(queryResultOutput.ResultSet)
英文:

Is there an easy way to convert from athena.GetQueryResultsOutput struct to a user-defined struct in Go?

Here's a snippet of what I want to achieve:

const testSql = `
select
id,
name,
count(source_id) as aggregate_count,
array_agg(source_id) as aggregate_source_ids,
array_agg(source_name) as aggregate_source_names
from my_glue_catalog_table
group by id, name
`
type myModel struct {
id                   int64
name                 string
aggregateCount       int
aggregateSourceIDs   []int64
aggregateSourceNames []string
}
queryResultOutput, err = awsAthenaClient.GetQueryResults(ctx, &queryResultInput)
// var mapped []myModel = mapper.FromGetQueryResultsOutput(queryResultOutput.ResultSet)

And here's the structure of queryResultOutput.ResultSet:

ResultSet: {
ResultSetMetadata: {
ColumnInfo: [
{
CaseSensitive: false,
CatalogName: "hive",
Label: "id",
Name: "id",
Nullable: "UNKNOWN",
Precision: 10,
Scale: 0,
SchemaName: "",
TableName: "",
Type: "integer"
},
{
CaseSensitive: true,
CatalogName: "hive",
Label: "name",
Name: "name",
Nullable: "UNKNOWN",
Precision: 2147483647,
Scale: 0,
SchemaName: "",
TableName: "",
Type: "varchar"
},
{
CaseSensitive: false,
CatalogName: "hive",
Label: "aggregate_count",
Name: "aggregate_count",
Nullable: "UNKNOWN",
Precision: 19,
Scale: 0,
SchemaName: "",
TableName: "",
Type: "bigint"
},
{
CaseSensitive: false,
CatalogName: "hive",
Label: "aggregate_source_ids",
Name: "aggregate_source_ids",
Nullable: "UNKNOWN",
Precision: 0,
Scale: 0,
SchemaName: "",
TableName: "",
Type: "array"
},
{
CaseSensitive: false,
CatalogName: "hive",
Label: "aggregate_source_names",
Name: "aggregate_source_names",
Nullable: "UNKNOWN",
Precision: 0,
Scale: 0,
SchemaName: "",
TableName: "",
Type: "array"
}
]
},
Rows: [{
// first row data (from page 1 of results) is header, we can ignore this
Data: [
VarCharValue: "id" },
VarCharValue: "name" },
VarCharValue: "aggregate_count" },
VarCharValue: "aggregate_source_ids" },
VarCharValue: "aggregate_source_names" }
]
},
// all subsequent rows are data values:
{
Data: [
VarCharValue: "920000" },
VarCharValue: "mydata1" },
VarCharValue: "2" },
VarCharValue: "[52800, 113000]" },
VarCharValue: "[sourcedata1, sourcedata2]" }
]
}
]
}

Full code example I've written: gist

If there's no better alternative, I'm thinking to write a converter package like the following. The main caveat will be converting arrays reliably though as one of the string values in the array might contain a comma. And I believe we can only return as []interface{} because Go doesn't support generics (as of today)?

// myModel defines a schema that corresponds with testSql above
type myModel struct {
id                   int      `athenaconv:"id"`
name                 string   `athenaconv:"name"`
aggregateCount       int64    `athenaconv:"aggregate_count"`
aggregateSourceIDs   []int64  `athenaconv:"aggregate_source_ids"`
aggregateSourceNames []string `athenaconv:"aggregate_source_names"`
}
mapper := athenaconv.MapperFor(reflect.TypeOf(myModel))
var mapped []interface{} = mapper.FromAthenaResultSet(queryResultOutput.ResultSet)

答案1

得分: 1

由于我无法找到更好的替代方案,并且我需要处理大量的数据集进行分析,所以我编写了自己的包库来解决列映射问题,并将Athena类型转换为Go数据类型。这将帮助我们将Athena的ResultSet绑定到强类型模型的切片中。

我将其开源在athenaconv下。留下这个答案,以防对其他人有所帮助。

获取

go get github.com/kent-id/athenaconv

用法

mapper, err := athenaconv.NewMapperFor(reflect.TypeOf(MyModel{}))
if err != nil {
handleError(err)
}
var mapped []interface{}
mapped, err = mapper.FromAthenaResultSetV2(ctx, queryResultOutput.ResultSet)
if err != nil {
handleError(err)
}
for _, mappedItem := range mapped {
mappedItemModel := mappedItem.(*MyModel)
fmt.Printf("%+v\n", *mappedItemModel)
}
英文:

As I couldn't find better alternative and I work with lots of data sets to analyze, I wrote my own package library to figure out the column mapping and convert from athena types to go data types. This will help us bind athena ResultSet to slice of strongly-typed model.

I made it open-source under athenaconv. Leaving as answer in case it helps someone else.

Go get it

go get github.com/kent-id/athenaconv

Usage

mapper, err := athenaconv.NewMapperFor(reflect.TypeOf(MyModel{}))
if err != nil {
handleError(err)
}
var mapped []interface{}
mapped, err = mapper.FromAthenaResultSetV2(ctx, queryResultOutput.ResultSet)
if err != nil {
handleError(err)
}
for _, mappedItem := range mapped {
mappedItemModel := mappedItem.(*MyModel)
fmt.Printf("%+v\n", *mappedItemModel)
}

huangapple
  • 本文由 发表于 2021年9月26日 12:04:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/69331782.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定