2023年5月10日 21:16:42go评论120阅读模式

英文:

Looping through struct column to get conditional outcome as new columns

问题

以下是您要翻译的代码部分：

Say I have a table in BQ called `rating` with a struct column called `rating_record`. Schema as:

[
  {
    "name": "id",
    "mode": "NULLABLE",
    "type": "STRING",
    "description": null,
    "fields": []
  },
  {
    "name": "rating_record",
    "mode": "NULLABLE",
    "type": "RECORD",
    "description": null,
    "fields": [
      {
        "name": "high_drop",
        "type": "BOOLEAN",
        "fields": []
      },
      {
        "name": "medium_bump",
        "type": "BOOLEAN",
        "fields": []
      }
    ]
  }
]

rating_record 包含字段 high_drop 和 medium_bump，可能会有许多带有后缀 _drop 和 _bump 的字段，其值为 true 或 false。我想要使用一个 dbt 宏来迭代此记录类型（struct）字段，以创建两个新列，分别对应 id，称为 drop_reasons 和 bump_reasons - 在这种情况下，如果值为 true，则 drop_reasons 将等于 'high_drop'。

我尝试使用 sql_statement 来迭代记录，并使用 dbt_utils.get_query_results_as_dict 来获取结果，但无法相应地创建列。

{% set sql_statement %}
    select id, rating_record from {{ ref('source_table' }}
{% endset %}

{%- set ids_and_ratings = dbt_utils.get_query_results_as_dict(sql_statement) -%}

select

    {% for id in ids_and_ratings['id'] | unique -%}
        {% set bump_reasons = [] %}
        {% set drop_reasons = [] %}
        {% for rating_record in ids_and_ratings['rating_record'] | unique -%}
            {% for key, value in fromjson(rating_record).items() -%}
                {% if key.endswith('bump') and value is sameas true %}
                    {{ bump_reasons.append(key) }}
                {% elif key.endswith('drop') and value is sameas true %}
                    {{ drop_reasons.append(key) }}
                {% endif %}
            {% endfor %}
        {% endfor %}
        {{ print(drop_reasons) }}
    {% endfor %}

from {{ ref('source_table' }}

请注意，这是您的代码的翻译部分，我已经省略了您的要求以及问题。如果您需要更多信息或有其他问题，请随时告诉我。

英文:

Say I have a table in BQ called rating with a struct column called rating_record. Schema as:

[
  {
    &quot;name&quot;: &quot;id&quot;,
    &quot;mode&quot;: &quot;NULLABLE&quot;,
    &quot;type&quot;: &quot;STRING&quot;,
    &quot;description&quot;: null,
    &quot;fields&quot;: []
  },
  {
    &quot;name&quot;: &quot;rating_record&quot;,
    &quot;mode&quot;: &quot;NULLABLE&quot;,
    &quot;type&quot;: &quot;RECORD&quot;,
    &quot;description&quot;: null,
    &quot;fields&quot;: [
      {
        &quot;name&quot;: &quot;high_drop&quot;,
        &quot;type&quot;: &quot;BOOLEAN&quot;,
        &quot;fields&quot;: []
      },
      {
        &quot;name&quot;: &quot;medium_bump&quot;,
        &quot;type&quot;: &quot;BOOLEAN&quot;,
        &quot;fields&quot;: []
      }
]

rating_record contains fields high_drop and medium_bump and there could be many fields with suffix _drop and _bump with true or false values. I want to iterate this record type (struct) field using a dbt macro to create two new columns against an id called drop_reasons & bump_reasons - drop_reasons in this case would be = 'high_drop' if the value is true.

I tried to iterate the record with a sql_statement and using dbt_utils.get_query_results_as_dict to get the outcome but unable to create columns accordingly.

{% set sql_statement %}
    select id, rating_record from {{ ref(&#39;source_table&#39; }}
{% endset %}

{%- set ids_and_ratings = dbt_utils.get_query_results_as_dict(sql_statement) -%}

select

    {% for id in ids_and_ratings[&#39;id&#39;] | unique -%}
        {% set bump_reasons = [] %}
        {% set drop_reasons = [] %}
        {% for rating_record in ids_and_ratings[&#39;rating_record&#39;] | unique -%}
            {% for key, value in fromjson(rating_record).items() -%}
                {% if key.endswith(&#39;bump&#39;) and value is sameas true %}
                    {{ bump_reasons.append(key) }}
                {% elif key.endswith(&#39;drop&#39;) and value is sameas true %}
                    {{ drop_reasons.append(key) }}
                {% endif %}
            {% endfor %}
        {% endfor %}
        {{ print(drop_reasons) }}
    {% endfor %}

from {{ ref(&#39;source_table&#39; }}

答案1

得分: 0

I suppose values in the fields of the table schema is a single value rather than an array.
表模式字段中的值应该是单个值而不是数组。

get_query_results_as_dict returns a key of the column and the value to be a list of the column value.
get_query_results_as_dict返回列的键以及值，值是列值的列表。

In the jinja code the first loop iterate all the ids, for each id the second loop iterate all rating_record struct for all ids but not of the id in the first loop.
在jinja代码中，第一个循环迭代所有的id，对于每个id，第二个循环迭代所有id的评分记录结构，但不包括第一个循环中的id。

From the logic of your jinja code, it can be done using BigQuery.
根据您的jinja代码逻辑，可以使用BigQuery来完成。

The dummy data is
虚拟数据如下：

Added a macro to get keys in the rating_record struct
添加了一个宏来获取评分记录结构中的键

The model is
模型如下：

where
在这里

will result in a table with columns id, tf which contains true/false, and bump_drop with values of something_bump/something_drop.
将生成一个包含列id、tf（包含true/false）和bump_drop（其值为something_bump/something_drop）的表。

英文:

I suppose values in the fields of the table schema is a single value rather than an array.
get_query_results_as_dict returns a key of the column and the value to be a list of the column value. In the jinja code the first loop iterate all the ids, for each id the second loop iterate all rating_record struct for all ids but not of the id in the first loop.
From the logic of your jinja code, it can be done using BigQuery.
The dummy data is

insert into database.table_name values
    (&quot;01&quot;, STRUCT(True, False, True, True)),
    (&quot;02&quot;, STRUCT(False, False, True, False)),
    (&quot;03&quot;, STRUCT(False, True, True, False))

Added a macro to get keys in the rating_record struct

{% macro get_struct_fields() %}
{% set query %}
SELECT split(field_path, &quot;.&quot;)[OFFSET(1)] as fields
FROM `project_id`.`region-us`.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS 
WHERE table_name = &quot;struct_tbl&quot;
and column_name = &quot;rating_record&quot;
and data_type = &quot;BOOL&quot; 
{% endset %}
{% set results = run_query(query) %}
{% if execute %}
    {% set fields = results.columns[0].values() %}
{% else %}
    {% set fields = [] %}
{% endif %}
{%do log(fields, info=true) %}
{{ return(fields)}}
{% endmacro %}

The model is

{% set fields = get_struct_fields() %}
{% set fields = fields | join(&quot;, &quot;) %}
select *
    from
    (select id, split(bump_drop, &quot;_&quot;)[offset(1)] as bump_drop, bump_drop as 
    reason
    from
    (select id, rating_record.* from `project_id.database.table_name`)
    unpivot (tf for bump_drop in ({{ fields }}))
    where tf is true
    )
    pivot (array_agg(reason ignore nulls) as reasons for bump_drop in (&quot;bump&quot;, 
    &quot;drop&quot;))

where

select * from (
select id, rating_record.* from `project_id.database.table_name`)
unpivot (tf for bump_drop in (high_bump, high_drop, medium_bump, medium_drop))

will result in a table with columns id, tf which contains true/false, and bump_drop with values of something_bump/something_drop.

Then select only true part and separate out the 'bump' and 'drop'.
The unpivot will put reasons for bump and drop separately into an array for each id.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

遍历结构体列以获取条件结果作为新列。

问题

答案1

Find data between dates using BigQuery

搜索一个字典键并根据另一个列表提取其值。

BigQuery: 使用INFORMATION_SCHEMA获取表格描述

Jinja2 – 如何获取 Python 的 None 而不是字符串 “None”

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论