英文:
Configure Dataform incremental table to do nothing when matched
问题
当创建一个增量表时,行为是当有匹配项时,由“uniqueKey”定义,其他字段将被更新。我想要的是,当有匹配项时不进行更新,只插入新的行而不重复。我该如何实现这个目标?我可以考虑的另一种方法是编写自己的操作,但那样我就失去了使用SELECT语句进行预览的好处。
config {
type: "incremental",
uniqueKey: ["alarm_number"],
bigquery: {
partitionBy: "DATE(alarm_time)",
clusterBy: ["alarm_number", "location", "element"],
updatePartitionFilter: "time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 4 DAY)"
}
}
pre_operations {
DECLARE trigger_checkpoint DEFAULT (
${when(incremental(),
`SELECT MAX(time) FROM ${self()}`,
`SELECT TIMESTAMP("2020-01-01")`)}
)
}
SELECT
*,
FALSE AS notified -- 当有重复/合并时,不应更新此列
FROM ${ref("source")}
AND deviation >= 1.1
英文:
When creating an incremental table, the behavior is that when there is a matched, defined by "uniqueKey", the other fields will be updated. I would like to, instead, not update when there is a match, and only insert new rows without duplicates. How can I achieve this? An alternative I can think of is to write my own operation, but then I lose the benefit of previewing with the SELECT statement.
config {
type: "incremental",
uniqueKey: ["alarm_number"],
bigquery: {
partitionBy: "DATE(alarm_time)",
clusterBy: ["alarm_number", "location", "element"],
updatePartitionFilter: "time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 4 DAY)"
}
}
pre_operations {
DECLARE trigger_checkpoint DEFAULT (
${when(incremental(),
`SELECT MAX(time) FROM ${self()}`,
`SELECT TIMESTAMP("2020-01-01")`)}
)
}
SELECT
*,
FALSE AS notified -- this column should not be updated when there is a duplicate/merge
FROM ${ref("source")}
AND deviation >= 1.1
答案1
得分: 1
我一直在处理这个案例的方式是通过使用self()
引用来验证行是否已经存在于目标表中。
在你的情况下,我会这样做:
SELECT
s.*,
FALSE AS notified -- 当有重复/合并时,不应更新此列
FROM ${ref("source")} s
LEFT JOIN ${self()} se on s.alarm_number = se.alarm_number
where
se.alarm_number is null
英文:
The way I have been approaching this case is by verifying myself if the row is already in the destination table or not using the self()
reference.
In your case, I would do something like this:
SELECT
s.*,
FALSE AS notified -- this column should not be updated when there is a duplicate/merge
FROM ${ref("source")} s
LEFT JOIN ${self()} se on s.alarm_number = se.alarm_number
where
se.alarm_number is null
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论