如何在 dbt 中使用多个 unique_keys 合并,并使用 Jinja 引用唯一键?

huangapple go评论109阅读模式
英文:

dbt merge with multiple unique_keys how to reference the unique key with jinja?

问题

根据dbt文档的说法:

> 在需要多个列组合以唯一标识每一行的情况下,我们建议将这些列作为列表传递(unique_key = ['user_id', 'session_number']),而不是作为字符串表达式传递(unique_key = 'concat(user_id, session_number)')。

因此,我创建了一个小型增量模型,在其中需要使用多个唯一键。

{{ config(materialized='table',
   incremental_strategy='merge', 
   unique_key=['col1', 'col2', 'col3']
) 
}}

WITH using_clause AS (
SELECT ...
FROM tbl
{% if is_incremental() %}
  WHERE ingested_dt > (SELECT MAX(ingested_dt ) FROM {{ this }})
{% endif %}
)

, updates AS (

SELECT ...
FROM using_clause
 {% if is_incremental() %}

        WHERE my_unique_keys IN (SELECT unique_keys FROM {{ this }})

  {% endif %}

) 

, inserts AS (
  SELECT ..
  FROM using_clause
  WHERE unique_keys NOT IN (select unique_keys from updates)
  ) 

SELECT * FROM updates UNION inserts

选取多个列的正确方式是什么?我假设是类似于{ref(unique_keys)}?但我在文档中找不到相关信息。

我目前的方法只是对列进行哈希处理(使用md5),但我想知道是否有一种原生方法可以做到这一点,因为unique_key参数接受多个列。

英文:

as per dbt docs it states:

> In cases where you need multiple columns in combination to uniquely
> identify each row, we recommend you pass these columns as a list
> (unique_key = ['user_id', 'session_number']), rather than a string
> expression (unique_key = 'concat(user_id, session_number)').

so I've created a small incremental model where I need to use multiple unique keys.

{{ config(materialized='table',
   incremental_strategy='merge', 
   unique_key=['col1', 'col2', 'col3']
) 
}}

WITH using_clause AS (
SELECT ...
FROM tbl
{% if is_incremental() %}
  WHERE ingested_dt > (SELECT MAX(ingested_dt ) FROM {{ this }})
{% endif %}
)

, updates AS (

SELECT ...
FROM using_clause
 {% if is_incremental() %}

        WHERE my_unique_keys IN (SELECT unique_keys FROM {{ this }})

  {% endif %}

) 

, inserts AS (
  SELECT ..
  FROM using_clause
  WHERE unique_keys NOT IN (select unique_keys from updates)
  ) 

SELECT * FROM updates UNION inserts 

whats the correct way to select the multiple columns? I assume something along the lines of {ref(unique_keys)} ? but I couldn't find anything in the docs.

my current method just hashes the columns (md5) but i wonder if there is a native method to do this given that the unique_key param accepts multiple columns.

答案1

得分: 2

config对象在DBT Jinja上下文中可用。请参阅https://docs.getdbt.com/reference/dbt-jinja-functions/config

如果您想在模型内部使用unique_key,可以像这样操作:

SELECT {{ config.get("unique_key")|join(',') }}, ...
FROM {{ this }}

编译后,它将如下所示:

SELECT col1,col2,col3, ...
FROM my_model
英文:

The config object is available to you in the DBT Jinja context. See https://docs.getdbt.com/reference/dbt-jinja-functions/config

If you want to use the unique_key inside your model, you can do something like

SELECT {{ config.get("unique_key")|join(',') }}, ...
FROM {{ this }}

After compilation, it'll look like this

SELECT col1,col2,col3, ...
FROM my_model

huangapple
  • 本文由 发表于 2023年1月9日 08:38:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/75052295.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定