英文:
Configure Dataform incremental table to do nothing when matched
问题
当创建一个增量表时,行为是当有匹配项时,由“uniqueKey”定义,其他字段将被更新。我想要的是,当有匹配项时不进行更新,只插入新的行而不重复。我该如何实现这个目标?我可以考虑的另一种方法是编写自己的操作,但那样我就失去了使用SELECT语句进行预览的好处。
config {
    type: "incremental",
    uniqueKey: ["alarm_number"],
    bigquery: {
        partitionBy: "DATE(alarm_time)",
        clusterBy: ["alarm_number", "location", "element"],
        updatePartitionFilter: "time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 4 DAY)"
    }
}
pre_operations {
    DECLARE trigger_checkpoint DEFAULT (
        ${when(incremental(),
        `SELECT MAX(time) FROM ${self()}`,
        `SELECT TIMESTAMP("2020-01-01")`)}
    )
}
SELECT
    *,
    FALSE AS notified -- 当有重复/合并时,不应更新此列
FROM ${ref("source")}
AND deviation >= 1.1
英文:
When creating an incremental table, the behavior is that when there is a matched, defined by "uniqueKey", the other fields will be updated. I would like to, instead, not update when there is a match, and only insert new rows without duplicates. How can I achieve this? An alternative I can think of is to write my own operation, but then I lose the benefit of previewing with the SELECT statement.
config {
    type: "incremental",
    uniqueKey: ["alarm_number"],
    bigquery: {
        partitionBy: "DATE(alarm_time)",
        clusterBy: ["alarm_number", "location", "element"],
        updatePartitionFilter: "time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 4 DAY)"
    }
}
pre_operations {
    DECLARE trigger_checkpoint DEFAULT (
        ${when(incremental(),
        `SELECT MAX(time) FROM ${self()}`,
        `SELECT TIMESTAMP("2020-01-01")`)}
    )
}
SELECT
    *,
    FALSE AS notified -- this column should not be updated when there is a duplicate/merge
FROM ${ref("source")}
AND deviation >= 1.1
答案1
得分: 1
我一直在处理这个案例的方式是通过使用self()引用来验证行是否已经存在于目标表中。
在你的情况下,我会这样做:
SELECT
    s.*,
    FALSE AS notified -- 当有重复/合并时,不应更新此列
FROM ${ref("source")} s
LEFT JOIN ${self()} se on s.alarm_number = se.alarm_number
where
    se.alarm_number is null
英文:
The way I have been approaching this case is by verifying myself if the row is already in the destination table or not using the self() reference.
In your case, I would do something like this:
SELECT
    s.*,
    FALSE AS notified -- this column should not be updated when there is a duplicate/merge
FROM ${ref("source")} s
LEFT JOIN ${self()} se on s.alarm_number = se.alarm_number
where
    se.alarm_number is null
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论