问题

以下是您提供的代码的中文翻译部分：

我有以下用于根据特定键合并和截断表格的过程。 过程看起来没问题，但似乎存在一些性能问题

创建或替换过程table_merge（键数组）
  返回STRING
  语言PYTHON
  运行时版本='3.8'
  包=('snowflake-snowpark-python',)
  处理程序='reverse_str'
如
$$
from snowflake.snowpark.functions import when_matached
def table_merge(session, keys):
  df_src = session.table(source_table)
  df_tgt = session.table(target_table)
  key_condition = None
  for key in deletion_keys:
    if condition is None:
       condition = df_src[key] == df_tgt[key]
    else:
        condition = condition &amp; (df_src[key] == df_tgt[key])
  df_tgt.merge(df_src, condition, [when_matched().delete()])

  return '通过'
$$;

当我运行上述查询时，即使是3条记录，它也需要大约28秒？

因此，为了使用普通的Snowflake查询检查执行情况，我尝试了以下查询

将target_table t合并到使用（从source_table选择“ID”，“NAME”）s on s.ID = t.ID and s.NAME = t.NAME when matched then delete

但上述查询已执行，但未如预期地合并

#期望：

从源合并数据到目标
在合并时，如果源中的键与目标表匹配，然后截断目标表中的这些行，然后将源中的新数据插入目标表
截断的键应该是动态的，因此查询语句应该能够根据传递给过程的键截断行。

是否有任何解决方案或建议来改进查询？


<details>
<summary>英文:</summary>

I have below procedure for merging and truncating table based on certain keys. The procedure looks fine, but it seems some performance issue 

~~~


CREATE OR REPLACE PROCEDURE table_merge(keys Array)
  RETURNS STRING
  LANGUAGE PYTHON
  RUNTIME_VERSION = &#39;3.8&#39;
  PACKAGES = (&#39;snowflake-snowpark-python&#39;)
  HANDLER = &#39;reverse_str&#39;
AS
$$
from snowflake.snowpark.functions import when_matached
def table_merge(session,keys):
  df_src= session.table(source_table)
  df_tgt= session.table(target_table)
  key_conditon = None
  for key in deletion_keys:
    if condition = None:
       condition = df_src[key]==df_tgt[key]
    else:
        condition = condition &amp; (df_src[key]==df_tgt[key])
  df_tgt.merge(df_src, condition, [when_matched().delete()])

  return &#39;Pass&#39;
$$; 
~~~

when i ran the above query, it taking around 28.seconds even for 3 records?

So to check the execution using normal snowflake query, i tried with below query

~~~
merge into  target_table t using (select &quot;ID&quot;, &quot;NAME&quot; from source_table) s on s.ID = t.ID and s.NAME = t.NAME when matched then delete
~~~
But above query executed, but it didn&#39;t merge as expected


#Expected:

1. Merge data from source to Target
2. While merging , if keys from source matches with target table, then truncate those rows in the target table, then insert new data from source in to the target
3. Keys for truncate should be dynamic, so the query statement should able to truncate rows based on keys which is being passed in to the procedure.

Any solution or recommendation to improve the query?

</details>


# 答案1
**得分**: 0

Option 1. 删除和插入 - 这种方式更快，因为它是批量操作。在您的Python过程中，您的输入是键。所以，
首先根据键删除 -

delete from target_table where keys in (select keys from src_table) and keys in (input_keys)

其次，从源插入数据 -

insert into target_table select * from src_table where keys in (input_keys)


Option 2 - 使用合并。这将较慢，因为合并是较慢的操作。

merge into target_table t using (select "ID" sid, "NAME" sname from source_table keys in (input_keys)) s on s.ID = t.ID and s.NAME = t.NAME when matched then UPDATE
set id=sid, name=sname;

您真的不必删除它。您只需更新来自源的所有列。

<details>
<summary>英文:</summary>

I think there are two ways to solve this.

Option 1. Delete and insert - This is faster because of bulk operation. In your py procedure, your input is keys. So, 
First delete based on keys -

delete from target_table where keys in (select keys from src_table) and keys in (input_keys)

Second, insert data from source -

insert into target_table select * from src_table where keys in (input_keys)


Option 2 - Using Merge. This will be slower since merge is slower operation.

merge into target_table t using (select "ID" sid, "NAME" sname from source_table keys in (input_keys)) s on s.ID = t.ID and s.NAME = t.NAME when matched then UPDATE
set id=sid, name=sname;

You really dont have to delete it. You just have to update all your columns from data coming from source. 
 

</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何提高雪地公园程序的性能？

问题

TypeError in Django: “float () argument must be a string or a number, not ‘tuple.'”

EMR 无服务器 – 在控制台中传递 JAR 文件

子类带有额外参数的Python类继承

如何在pyspark中重命名嵌套列内的列

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论