英文:
Converting Athena output to strongly typed model in Go
问题
有没有一种简单的方法可以将athena.GetQueryResultsOutput
结构体转换为Go中的用户定义结构体?
这是我想要实现的一部分代码:
const testSql = `
select
id,
name,
count(source_id) as aggregate_count,
array_agg(source_id) as aggregate_source_ids,
array_agg(source_name) as aggregate_source_names
from my_glue_catalog_table
group by id, name
`
type myModel struct {
id int64
name string
aggregateCount int
aggregateSourceIDs []int64
aggregateSourceNames []string
}
queryResultOutput, err = awsAthenaClient.GetQueryResults(ctx, &queryResultInput)
// var mapped []myModel = mapper.FromGetQueryResultsOutput(queryResultOutput.ResultSet)
这是queryResultOutput.ResultSet
的结构:
ResultSet: {
ResultSetMetadata: {
ColumnInfo: [
{
CaseSensitive: false,
CatalogName: "hive",
Label: "id",
Name: "id",
Nullable: "UNKNOWN",
Precision: 10,
Scale: 0,
SchemaName: "",
TableName: "",
Type: "integer"
},
{
CaseSensitive: true,
CatalogName: "hive",
Label: "name",
Name: "name",
Nullable: "UNKNOWN",
Precision: 2147483647,
Scale: 0,
SchemaName: "",
TableName: "",
Type: "varchar"
},
{
CaseSensitive: false,
CatalogName: "hive",
Label: "aggregate_count",
Name: "aggregate_count",
Nullable: "UNKNOWN",
Precision: 19,
Scale: 0,
SchemaName: "",
TableName: "",
Type: "bigint"
},
{
CaseSensitive: false,
CatalogName: "hive",
Label: "aggregate_source_ids",
Name: "aggregate_source_ids",
Nullable: "UNKNOWN",
Precision: 0,
Scale: 0,
SchemaName: "",
TableName: "",
Type: "array"
},
{
CaseSensitive: false,
CatalogName: "hive",
Label: "aggregate_source_names",
Name: "aggregate_source_names",
Nullable: "UNKNOWN",
Precision: 0,
Scale: 0,
SchemaName: "",
TableName: "",
Type: "array"
}
]
},
Rows: [{
// 第一行数据(来自结果的第一页)是标题,我们可以忽略它
Data: [
VarCharValue: "id" },
VarCharValue: "name" },
VarCharValue: "aggregate_count" },
VarCharValue: "aggregate_source_ids" },
VarCharValue: "aggregate_source_names" }
]
},
// 所有后续行都是数据值:
{
Data: [
VarCharValue: "920000" },
VarCharValue: "mydata1" },
VarCharValue: "2" },
VarCharValue: "[52800, 113000]" },
VarCharValue: "[sourcedata1, sourcedata2]" }
]
}
]
}
我写了一个完整的代码示例:gist
如果没有更好的替代方案,我打算编写一个转换器包,类似下面的代码。主要的问题是可靠地转换数组,因为数组中的一个字符串值可能包含逗号。而且我相信我们只能返回[]interface{}
,因为Go不支持泛型(截至今天)。
// myModel定义了与上面的testSql对应的模式
type myModel struct {
id int `athenaconv:"id"`
name string `athenaconv:"name"`
aggregateCount int64 `athenaconv:"aggregate_count"`
aggregateSourceIDs []int64 `athenaconv:"aggregate_source_ids"`
aggregateSourceNames []string `athenaconv:"aggregate_source_names"`
}
mapper := athenaconv.MapperFor(reflect.TypeOf(myModel))
var mapped []interface{} = mapper.FromAthenaResultSet(queryResultOutput.ResultSet)
英文:
Is there an easy way to convert from athena.GetQueryResultsOutput
struct to a user-defined struct in Go?
Here's a snippet of what I want to achieve:
const testSql = `
select
id,
name,
count(source_id) as aggregate_count,
array_agg(source_id) as aggregate_source_ids,
array_agg(source_name) as aggregate_source_names
from my_glue_catalog_table
group by id, name
`
type myModel struct {
id int64
name string
aggregateCount int
aggregateSourceIDs []int64
aggregateSourceNames []string
}
queryResultOutput, err = awsAthenaClient.GetQueryResults(ctx, &queryResultInput)
// var mapped []myModel = mapper.FromGetQueryResultsOutput(queryResultOutput.ResultSet)
And here's the structure of queryResultOutput.ResultSet
:
ResultSet: {
ResultSetMetadata: {
ColumnInfo: [
{
CaseSensitive: false,
CatalogName: "hive",
Label: "id",
Name: "id",
Nullable: "UNKNOWN",
Precision: 10,
Scale: 0,
SchemaName: "",
TableName: "",
Type: "integer"
},
{
CaseSensitive: true,
CatalogName: "hive",
Label: "name",
Name: "name",
Nullable: "UNKNOWN",
Precision: 2147483647,
Scale: 0,
SchemaName: "",
TableName: "",
Type: "varchar"
},
{
CaseSensitive: false,
CatalogName: "hive",
Label: "aggregate_count",
Name: "aggregate_count",
Nullable: "UNKNOWN",
Precision: 19,
Scale: 0,
SchemaName: "",
TableName: "",
Type: "bigint"
},
{
CaseSensitive: false,
CatalogName: "hive",
Label: "aggregate_source_ids",
Name: "aggregate_source_ids",
Nullable: "UNKNOWN",
Precision: 0,
Scale: 0,
SchemaName: "",
TableName: "",
Type: "array"
},
{
CaseSensitive: false,
CatalogName: "hive",
Label: "aggregate_source_names",
Name: "aggregate_source_names",
Nullable: "UNKNOWN",
Precision: 0,
Scale: 0,
SchemaName: "",
TableName: "",
Type: "array"
}
]
},
Rows: [{
// first row data (from page 1 of results) is header, we can ignore this
Data: [
VarCharValue: "id" },
VarCharValue: "name" },
VarCharValue: "aggregate_count" },
VarCharValue: "aggregate_source_ids" },
VarCharValue: "aggregate_source_names" }
]
},
// all subsequent rows are data values:
{
Data: [
VarCharValue: "920000" },
VarCharValue: "mydata1" },
VarCharValue: "2" },
VarCharValue: "[52800, 113000]" },
VarCharValue: "[sourcedata1, sourcedata2]" }
]
}
]
}
Full code example I've written: gist
If there's no better alternative, I'm thinking to write a converter package like the following. The main caveat will be converting arrays reliably though as one of the string values in the array might contain a comma. And I believe we can only return as []interface{}
because Go doesn't support generics (as of today)?
// myModel defines a schema that corresponds with testSql above
type myModel struct {
id int `athenaconv:"id"`
name string `athenaconv:"name"`
aggregateCount int64 `athenaconv:"aggregate_count"`
aggregateSourceIDs []int64 `athenaconv:"aggregate_source_ids"`
aggregateSourceNames []string `athenaconv:"aggregate_source_names"`
}
mapper := athenaconv.MapperFor(reflect.TypeOf(myModel))
var mapped []interface{} = mapper.FromAthenaResultSet(queryResultOutput.ResultSet)
答案1
得分: 1
由于我无法找到更好的替代方案,并且我需要处理大量的数据集进行分析,所以我编写了自己的包库来解决列映射问题,并将Athena类型转换为Go数据类型。这将帮助我们将Athena的ResultSet
绑定到强类型模型的切片中。
我将其开源在athenaconv下。留下这个答案,以防对其他人有所帮助。
获取
go get github.com/kent-id/athenaconv
用法
mapper, err := athenaconv.NewMapperFor(reflect.TypeOf(MyModel{}))
if err != nil {
handleError(err)
}
var mapped []interface{}
mapped, err = mapper.FromAthenaResultSetV2(ctx, queryResultOutput.ResultSet)
if err != nil {
handleError(err)
}
for _, mappedItem := range mapped {
mappedItemModel := mappedItem.(*MyModel)
fmt.Printf("%+v\n", *mappedItemModel)
}
英文:
As I couldn't find better alternative and I work with lots of data sets to analyze, I wrote my own package library to figure out the column mapping and convert from athena types to go data types. This will help us bind athena ResultSet
to slice of strongly-typed model.
I made it open-source under athenaconv. Leaving as answer in case it helps someone else.
Go get it
go get github.com/kent-id/athenaconv
Usage
mapper, err := athenaconv.NewMapperFor(reflect.TypeOf(MyModel{}))
if err != nil {
handleError(err)
}
var mapped []interface{}
mapped, err = mapper.FromAthenaResultSetV2(ctx, queryResultOutput.ResultSet)
if err != nil {
handleError(err)
}
for _, mappedItem := range mapped {
mappedItemModel := mappedItem.(*MyModel)
fmt.Printf("%+v\n", *mappedItemModel)
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论