英文:
How to improve the performance in snowpark procedure?
问题
以下是您提供的代码的中文翻译部分:
我有以下用于根据特定键合并和截断表格的过程。 过程看起来没问题,但似乎存在一些性能问题
创建或替换过程table_merge(键数组)
返回STRING
语言PYTHON
运行时版本='3.8'
包=('snowflake-snowpark-python',)
处理程序='reverse_str'
如
$$
from snowflake.snowpark.functions import when_matached
def table_merge(session, keys):
df_src = session.table(source_table)
df_tgt = session.table(target_table)
key_condition = None
for key in deletion_keys:
if condition is None:
condition = df_src[key] == df_tgt[key]
else:
condition = condition & (df_src[key] == df_tgt[key])
df_tgt.merge(df_src, condition, [when_matched().delete()])
return '通过'
$$;
当我运行上述查询时,即使是3条记录,它也需要大约28秒?
因此,为了使用普通的Snowflake查询检查执行情况,我尝试了以下查询
将target_table t合并到使用(从source_table选择“ID”,“NAME”)s on s.ID = t.ID and s.NAME = t.NAME when matched then delete
但上述查询已执行,但未如预期地合并
#期望:
- 从源合并数据到目标
- 在合并时,如果源中的键与目标表匹配,然后截断目标表中的这些行,然后将源中的新数据插入目标表
- 截断的键应该是动态的,因此查询语句应该能够根据传递给过程的键截断行。
是否有任何解决方案或建议来改进查询?
<details>
<summary>英文:</summary>
I have below procedure for merging and truncating table based on certain keys. The procedure looks fine, but it seems some performance issue
~~~
CREATE OR REPLACE PROCEDURE table_merge(keys Array)
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python')
HANDLER = 'reverse_str'
AS
$$
from snowflake.snowpark.functions import when_matached
def table_merge(session,keys):
df_src= session.table(source_table)
df_tgt= session.table(target_table)
key_conditon = None
for key in deletion_keys:
if condition = None:
condition = df_src[key]==df_tgt[key]
else:
condition = condition & (df_src[key]==df_tgt[key])
df_tgt.merge(df_src, condition, [when_matched().delete()])
return 'Pass'
$$;
~~~
when i ran the above query, it taking around 28.seconds even for 3 records?
So to check the execution using normal snowflake query, i tried with below query
~~~
merge into target_table t using (select "ID", "NAME" from source_table) s on s.ID = t.ID and s.NAME = t.NAME when matched then delete
~~~
But above query executed, but it didn't merge as expected
#Expected:
1. Merge data from source to Target
2. While merging , if keys from source matches with target table, then truncate those rows in the target table, then insert new data from source in to the target
3. Keys for truncate should be dynamic, so the query statement should able to truncate rows based on keys which is being passed in to the procedure.
Any solution or recommendation to improve the query?
</details>
# 答案1
**得分**: 0
Option 1. 删除和插入 - 这种方式更快,因为它是批量操作。在您的Python过程中,您的输入是键。所以,
首先根据键删除 -
delete from target_table where keys in (select keys from src_table) and keys in (input_keys)
其次,从源插入数据 -
insert into target_table select * from src_table where keys in (input_keys)
Option 2 - 使用合并。这将较慢,因为合并是较慢的操作。
merge into target_table t using (select "ID" sid, "NAME" sname from source_table keys in (input_keys)) s on s.ID = t.ID and s.NAME = t.NAME when matched then UPDATE
set id=sid, name=sname;
您真的不必删除它。您只需更新来自源的所有列。
<details>
<summary>英文:</summary>
I think there are two ways to solve this.
Option 1. Delete and insert - This is faster because of bulk operation. In your py procedure, your input is keys. So,
First delete based on keys -
delete from target_table where keys in (select keys from src_table) and keys in (input_keys)
Second, insert data from source -
insert into target_table select * from src_table where keys in (input_keys)
Option 2 - Using Merge. This will be slower since merge is slower operation.
merge into target_table t using (select "ID" sid, "NAME" sname from source_table keys in (input_keys)) s on s.ID = t.ID and s.NAME = t.NAME when matched then UPDATE
set id=sid, name=sname;
You really dont have to delete it. You just have to update all your columns from data coming from source.
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论