问题

我有一个ETL过程，它正在使用一个类似于：1801b08dd8731d35bb561943e708f7e3 的唯一哈希替代键，从一个包含180亿行的表中删除几十万行。

delete from CUSTOMER_CONFORM_PROD.c360.engagement
where (
    engagement_surrogate_key) in (
    select (engagement_surrogate_key)
    from CUSTOMER_CONFORM_PROD.c360.engagement__dbt_tmp
);

每次在一个小型数据仓库上需要4到6分钟。我已经在engagement_surrogate_key上添加了一个聚簇键，但由于它具有高基数且是唯一的，没有帮助。我还启用了搜索优化服务，但也没有帮助，它仍然在扫描所有分区。如何加速删除操作？

英文:

I have an ETL process that it's deleting a couple hundred thousand rows from a table with 18 billion rows using a unique hashed surrogate key like: 1801b08dd8731d35bb561943e708f7e3

delete from CUSTOMER_CONFORM_PROD.c360.engagement
            where (
                engagement_surrogate_key) in (
                select (engagement_surrogate_key)
                from CUSTOMER_CONFORM_PROD.c360.engagement__dbt_tmp
            );

This is taking from 4 to 6 minutes each time on a Small warehouse. I have added a clustering key on the engagement_surrogate_key but since it's unique with high cardinality it didn't help. I have also enabled search optimization service but that also didn't help and it's still scanning all partitions. How can I speed up the deletion?

答案1

得分: 0

以下是您要翻译的部分：

"deletion can be speed up limiting the scan on the destination table by adding a date range, for example, filtering for only the past month worth of data: loaded_date>=dateadd(MM, -1, current_date). If you are using dbt they have implemented that functionality using this macro:

{% macro default__get_incremental_merge_sql(arg_dict) %}

  {% do return(get_merge_sql(arg_dict[&quot;target_relation&quot;], arg_dict[&quot;temp_relation&quot;], arg_dict[&quot;unique_key&quot;], arg_dict[&quot;dest_columns&quot;], arg_dict[&quot;predicates&quot;])) %}

{% endmacro %}

So you can add the predicate to the dbt incremental model config like this:

{{ config(materialized= &#39;incremental&#39;, unique_key=&#39;engagement_surrogate_key&#39;, predicates=[&#39;loaded_date&gt;=dateadd(M, -1, current_date)&#39;])}}

When you run your model, the code generated will be this:

delete from CUSTOMER_CONFORM_PROD.c360.engagement
        where (
            engagement_surrogate_key) in (
            select (engagement_surrogate_key)
            from CUSTOMER_CONFORM_PROD.c360.engagement__dbt_tmp
        )
                and loaded_date&gt;=dateadd(MM, -1, current_date);"

希望这对您有所帮助。

英文:

The deletion can be speed up limiting the scan on the destination table by adding a date range, for example, filtering for only the past month worth of data: loaded_date>=dateadd(MM, -1, current_date). If you are using dbt they have implemented that functionality using this macro:

{% macro default__get_incremental_merge_sql(arg_dict) %}

  {% do return(get_merge_sql(arg_dict[&quot;target_relation&quot;], arg_dict[&quot;temp_relation&quot;], arg_dict[&quot;unique_key&quot;], arg_dict[&quot;dest_columns&quot;], arg_dict[&quot;predicates&quot;])) %}

{% endmacro %}

So you can add the predicate to the dbt incremental model config like this:

{{ config(materialized= &#39;incremental&#39;, unique_key=&#39;engagement_surrogate_key&#39;, predicates=[&#39;loaded_date&gt;=dateadd(M, -1, current_date)&#39;])}}

When you run your model, the code generated will be this:

delete from CUSTOMER_CONFORM_PROD.c360.engagement
        where (
            engagement_surrogate_key) in (
            select (engagement_surrogate_key)
            from CUSTOMER_CONFORM_PROD.c360.engagement__dbt_tmp
        )
                and loaded_date&gt;=dateadd(MM, -1, current_date);

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Snowflake删除查询扫描所有分区

问题

答案1

snowflake with snowpark error: sql() missing 1 required positional argument: 'query'

Performance Issue on Laravel 9 Application Using TALL Stack (not in Docker or WSL)

在JMeter中，线程组中不同API的百分比

在创建索引会导致写操作变慢的情况下，是否应该创建列的索引？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论