英文:
dbt merge with multiple unique_keys how to reference the unique key with jinja?
问题
根据dbt
文档的说法:
> 在需要多个列组合以唯一标识每一行的情况下,我们建议将这些列作为列表传递(unique_key = ['user_id', 'session_number']),而不是作为字符串表达式传递(unique_key = 'concat(user_id, session_number)')。
因此,我创建了一个小型增量模型,在其中需要使用多个唯一键。
{{ config(materialized='table',
incremental_strategy='merge',
unique_key=['col1', 'col2', 'col3']
)
}}
WITH using_clause AS (
SELECT ...
FROM tbl
{% if is_incremental() %}
WHERE ingested_dt > (SELECT MAX(ingested_dt ) FROM {{ this }})
{% endif %}
)
, updates AS (
SELECT ...
FROM using_clause
{% if is_incremental() %}
WHERE my_unique_keys IN (SELECT unique_keys FROM {{ this }})
{% endif %}
)
, inserts AS (
SELECT ..
FROM using_clause
WHERE unique_keys NOT IN (select unique_keys from updates)
)
SELECT * FROM updates UNION inserts
选取多个列的正确方式是什么?我假设是类似于{ref(unique_keys)}
?但我在文档中找不到相关信息。
我目前的方法只是对列进行哈希处理(使用md5),但我想知道是否有一种原生方法可以做到这一点,因为unique_key
参数接受多个列。
英文:
as per dbt
docs it states:
> In cases where you need multiple columns in combination to uniquely
> identify each row, we recommend you pass these columns as a list
> (unique_key = ['user_id', 'session_number']), rather than a string
> expression (unique_key = 'concat(user_id, session_number)').
so I've created a small incremental model where I need to use multiple unique keys.
{{ config(materialized='table',
incremental_strategy='merge',
unique_key=['col1', 'col2', 'col3']
)
}}
WITH using_clause AS (
SELECT ...
FROM tbl
{% if is_incremental() %}
WHERE ingested_dt > (SELECT MAX(ingested_dt ) FROM {{ this }})
{% endif %}
)
, updates AS (
SELECT ...
FROM using_clause
{% if is_incremental() %}
WHERE my_unique_keys IN (SELECT unique_keys FROM {{ this }})
{% endif %}
)
, inserts AS (
SELECT ..
FROM using_clause
WHERE unique_keys NOT IN (select unique_keys from updates)
)
SELECT * FROM updates UNION inserts
whats the correct way to select the multiple columns? I assume something along the lines of {ref(unique_keys)}
? but I couldn't find anything in the docs.
my current method just hashes the columns (md5) but i wonder if there is a native method to do this given that the unique_key
param accepts multiple columns.
答案1
得分: 2
config
对象在DBT Jinja上下文中可用。请参阅https://docs.getdbt.com/reference/dbt-jinja-functions/config
如果您想在模型内部使用unique_key
,可以像这样操作:
SELECT {{ config.get("unique_key")|join(',') }}, ...
FROM {{ this }}
编译后,它将如下所示:
SELECT col1,col2,col3, ...
FROM my_model
英文:
The config
object is available to you in the DBT Jinja context. See https://docs.getdbt.com/reference/dbt-jinja-functions/config
If you want to use the unique_key
inside your model, you can do something like
SELECT {{ config.get("unique_key")|join(',') }}, ...
FROM {{ this }}
After compilation, it'll look like this
SELECT col1,col2,col3, ...
FROM my_model
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论