英文:
Looping through struct column to get conditional outcome as new columns
问题
以下是您要翻译的代码部分:
Say I have a table in BQ called `rating` with a struct column called `rating_record`. Schema as:
[
{
"name": "id",
"mode": "NULLABLE",
"type": "STRING",
"description": null,
"fields": []
},
{
"name": "rating_record",
"mode": "NULLABLE",
"type": "RECORD",
"description": null,
"fields": [
{
"name": "high_drop",
"type": "BOOLEAN",
"fields": []
},
{
"name": "medium_bump",
"type": "BOOLEAN",
"fields": []
}
]
}
]
rating_record
包含字段 high_drop
和 medium_bump
,可能会有许多带有后缀 _drop
和 _bump
的字段,其值为 true 或 false。我想要使用一个 dbt 宏来迭代此记录类型(struct)字段,以创建两个新列,分别对应 id
,称为 drop_reasons
和 bump_reasons
- 在这种情况下,如果值为 true,则 drop_reasons
将等于 'high_drop'
。
我尝试使用 sql_statement
来迭代记录,并使用 dbt_utils.get_query_results_as_dict
来获取结果,但无法相应地创建列。
{% set sql_statement %}
select id, rating_record from {{ ref('source_table' }}
{% endset %}
{%- set ids_and_ratings = dbt_utils.get_query_results_as_dict(sql_statement) -%}
select
{% for id in ids_and_ratings['id'] | unique -%}
{% set bump_reasons = [] %}
{% set drop_reasons = [] %}
{% for rating_record in ids_and_ratings['rating_record'] | unique -%}
{% for key, value in fromjson(rating_record).items() -%}
{% if key.endswith('bump') and value is sameas true %}
{{ bump_reasons.append(key) }}
{% elif key.endswith('drop') and value is sameas true %}
{{ drop_reasons.append(key) }}
{% endif %}
{% endfor %}
{% endfor %}
{{ print(drop_reasons) }}
{% endfor %}
from {{ ref('source_table' }}
请注意,这是您的代码的翻译部分,我已经省略了您的要求以及问题。如果您需要更多信息或有其他问题,请随时告诉我。
英文:
Say I have a table in BQ called rating
with a struct column called rating_record
. Schema as:
[
{
"name": "id",
"mode": "NULLABLE",
"type": "STRING",
"description": null,
"fields": []
},
{
"name": "rating_record",
"mode": "NULLABLE",
"type": "RECORD",
"description": null,
"fields": [
{
"name": "high_drop",
"type": "BOOLEAN",
"fields": []
},
{
"name": "medium_bump",
"type": "BOOLEAN",
"fields": []
}
]
rating_record
contains fields high_drop
and medium_bump
and there could be many fields with suffix _drop
and _bump
with true or false values. I want to iterate this record type (struct) field using a dbt macro to create two new columns against an id
called drop_reasons
& bump_reasons
- drop_reasons
in this case would be = 'high_drop'
if the value is true.
I tried to iterate the record with a sql_statement
and using dbt_utils.get_query_results_as_dict
to get the outcome but unable to create columns accordingly.
{% set sql_statement %}
select id, rating_record from {{ ref('source_table' }}
{% endset %}
{%- set ids_and_ratings = dbt_utils.get_query_results_as_dict(sql_statement) -%}
select
{% for id in ids_and_ratings['id'] | unique -%}
{% set bump_reasons = [] %}
{% set drop_reasons = [] %}
{% for rating_record in ids_and_ratings['rating_record'] | unique -%}
{% for key, value in fromjson(rating_record).items() -%}
{% if key.endswith('bump') and value is sameas true %}
{{ bump_reasons.append(key) }}
{% elif key.endswith('drop') and value is sameas true %}
{{ drop_reasons.append(key) }}
{% endif %}
{% endfor %}
{% endfor %}
{{ print(drop_reasons) }}
{% endfor %}
from {{ ref('source_table' }}
答案1
得分: 0
I suppose values in the fields of the table schema is a single value rather than an array.
表模式字段中的值应该是单个值而不是数组。
get_query_results_as_dict returns a key of the column and the value to be a list of the column value.
get_query_results_as_dict返回列的键以及值,值是列值的列表。
In the jinja code the first loop iterate all the ids, for each id the second loop iterate all rating_record struct for all ids but not of the id in the first loop.
在jinja代码中,第一个循环迭代所有的id,对于每个id,第二个循环迭代所有id的评分记录结构,但不包括第一个循环中的id。
From the logic of your jinja code, it can be done using BigQuery.
根据您的jinja代码逻辑,可以使用BigQuery来完成。
The dummy data is
虚拟数据如下:
Added a macro to get keys in the rating_record struct
添加了一个宏来获取评分记录结构中的键
The model is
模型如下:
where
在这里
will result in a table with columns id, tf which contains true/false, and bump_drop with values of something_bump/something_drop.
将生成一个包含列id、tf(包含true/false)和bump_drop(其值为something_bump/something_drop)的表。
英文:
I suppose values in the fields of the table schema is a single value rather than an array.
get_query_results_as_dict returns a key of the column and the value to be a list of the column value. In the jinja code the first loop iterate all the ids, for each id the second loop iterate all rating_record struct for all ids but not of the id in the first loop.
From the logic of your jinja code, it can be done using BigQuery.
The dummy data is
insert into database.table_name values
("01", STRUCT(True, False, True, True)),
("02", STRUCT(False, False, True, False)),
("03", STRUCT(False, True, True, False))
Added a macro to get keys in the rating_record struct
{% macro get_struct_fields() %}
{% set query %}
SELECT split(field_path, ".")[OFFSET(1)] as fields
FROM `project_id`.`region-us`.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
WHERE table_name = "struct_tbl"
and column_name = "rating_record"
and data_type = "BOOL"
{% endset %}
{% set results = run_query(query) %}
{% if execute %}
{% set fields = results.columns[0].values() %}
{% else %}
{% set fields = [] %}
{% endif %}
{%do log(fields, info=true) %}
{{ return(fields)}}
{% endmacro %}
The model is
{% set fields = get_struct_fields() %}
{% set fields = fields | join(", ") %}
select *
from
(select id, split(bump_drop, "_")[offset(1)] as bump_drop, bump_drop as
reason
from
(select id, rating_record.* from `project_id.database.table_name`)
unpivot (tf for bump_drop in ({{ fields }}))
where tf is true
)
pivot (array_agg(reason ignore nulls) as reasons for bump_drop in ("bump",
"drop"))
where
select * from (
select id, rating_record.* from `project_id.database.table_name`)
unpivot (tf for bump_drop in (high_bump, high_drop, medium_bump, medium_drop))
will result in a table with columns id, tf which contains true/false, and bump_drop with values of something_bump/something_drop.
Then select only true part and separate out the 'bump' and 'drop'.
The unpivot will put reasons for bump and drop separately into an array for each id.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论