将Dataform的增量表配置为在匹配时不执行任何操作。

huangapple go评论53阅读模式
英文:

Configure Dataform incremental table to do nothing when matched

问题

当创建一个增量表时,行为是当有匹配项时,由“uniqueKey”定义,其他字段将被更新。我想要的是,当有匹配项时不进行更新,只插入新的行而不重复。我该如何实现这个目标?我可以考虑的另一种方法是编写自己的操作,但那样我就失去了使用SELECT语句进行预览的好处。

config {
    type: "incremental",
    uniqueKey: ["alarm_number"],
    bigquery: {
        partitionBy: "DATE(alarm_time)",
        clusterBy: ["alarm_number", "location", "element"],
        updatePartitionFilter: "time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 4 DAY)"
    }
}

pre_operations {
    DECLARE trigger_checkpoint DEFAULT (
        ${when(incremental(),
        `SELECT MAX(time) FROM ${self()}`,
        `SELECT TIMESTAMP("2020-01-01")`)}
    )
}

SELECT
    *,
    FALSE AS notified -- 当有重复/合并时,不应更新此列
FROM ${ref("source")}
AND deviation >= 1.1
英文:

When creating an incremental table, the behavior is that when there is a matched, defined by "uniqueKey", the other fields will be updated. I would like to, instead, not update when there is a match, and only insert new rows without duplicates. How can I achieve this? An alternative I can think of is to write my own operation, but then I lose the benefit of previewing with the SELECT statement.

config {
    type: "incremental",
    uniqueKey: ["alarm_number"],
    bigquery: {
        partitionBy: "DATE(alarm_time)",
        clusterBy: ["alarm_number", "location", "element"],
        updatePartitionFilter: "time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 4 DAY)"
    }
}

pre_operations {
    DECLARE trigger_checkpoint DEFAULT (
        ${when(incremental(),
        `SELECT MAX(time) FROM ${self()}`,
        `SELECT TIMESTAMP("2020-01-01")`)}
    )
}

SELECT
    *,
    FALSE AS notified -- this column should not be updated when there is a duplicate/merge
FROM ${ref("source")}
AND deviation >= 1.1

答案1

得分: 1

我一直在处理这个案例的方式是通过使用self()引用来验证行是否已经存在于目标表中。

在你的情况下,我会这样做:

SELECT
    s.*,
    FALSE AS notified -- 当有重复/合并时,不应更新此列
FROM ${ref("source")} s
LEFT JOIN ${self()} se on s.alarm_number = se.alarm_number
where
    se.alarm_number is null
英文:

The way I have been approaching this case is by verifying myself if the row is already in the destination table or not using the self() reference.

In your case, I would do something like this:

SELECT
    s.*,
    FALSE AS notified -- this column should not be updated when there is a duplicate/merge
FROM ${ref("source")} s
LEFT JOIN ${self()} se on s.alarm_number = se.alarm_number
where
    se.alarm_number is null

huangapple
  • 本文由 发表于 2023年6月12日 20:47:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76456820.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定