如何在Snowpark(Python)数据框中实现explode功能,而不使用explode()函数?

huangapple go评论73阅读模式
英文:

How to implement explode functionality in Snowpark(Python) dataframe without using the explode() function?

问题

我正在使用Snowpark(Python)工作,并且正在使用snowflake-snowpark-python库的版本1.0.0。我有一个版本约束,因此不能升级。在这个版本中,不支持explode()函数。

如何在不使用explode()函数的情况下使用Snowpark Dataframe来实现相同的功能?我只有一个需要展开的列。
我的数据框(df)看起来像这样:

df = session.create_dataframe(["rest_of_the_row", ["A|B|C"]], schema=["record", "product_code"])

我的期望输出是:

+----------------+------------+
|    record      |product_code|
+----------------+------------+
| rest_of_the_row| A          |
| rest_of_the_row| B          |
| rest_of_the_row| C          |
+----------------+------------+

我还会根据product_code进行进一步的聚合。请建议如何使用我自己的explode()实现这个目标?

英文:

I am working in Snowpark (Python), and I am using snowflake-snowpark-python library version 1.0.0. I have a version constraint, and hence cant upgrade. In this version, the function explode() is not supported.

How can I implement the same functionality using Snowpark Dataframe, without actually using explode() function? I have only one column that needs to be exploded.
My df looks like this:
df = session.create_dataframe(["rest_of_the_row",["A|B|C"]],schema=["record",product_code"])

My expected output is:

+----------------+------------+ 
|    record      |product_code|
+----------------+------------+
| rest_of_the_row| A          |
| rest_of_the_row| B          |
| rest_of_the_row| C          |
+----------------+------------+

I am further doing aggregations based on product_code.
Please suggest, how can I achieve this with my own implementation of explode() ?

答案1

得分: 0

我无法使您的 df 语句工作,所以您可能需要纠正它。假设这是您的数据样式:

df = session.create_dataframe([{"record": "rest_of_the_row", "product_code": "A|B|C"}])

df.show()

------------------------------------
|"RECORD"         |"PRODUCT_CODE"  |
------------------------------------
|rest_of_the_row  |A|B|C           |
------------------------------------

然后您可以使用 split_to_table 来处理:

from snowflake.snowpark import functions as F
df.join_table_function("split_to_table", df['product_code'], F.lit("|")) \
  .select(F.col("RECORD"), F.col("VALUE").alias("PRODUCT_CODE")) \
  .show()

explode 接受数组或映射(对象)类型的输入,而您的输入是一个包含分隔符的字符串,所以我认为 explode 在您的情况下不会起作用。

英文:

I couldn't get your df statement to work so you may want to correct it. Assuming this is what your data looks like:

df = session.create_dataframe([{"record": "rest_of_the_row","product_code": "A|B|C"}])

df.show()

------------------------------------
|"RECORD"         |"PRODUCT_CODE"  |
------------------------------------
|rest_of_the_row  |A|B|C           |
------------------------------------

Then you can use split_to_table for this:

from snowflake.snowpark import functions as F
df.join_table_function("split_to_table", df['product_code'], F.lit("|")) \
  .select(F.col("RECORD"), F.col("VALUE").alias("PRODUCT_CODE")) \
  .show()

Explode takes an ARRAY or MAP (OBJECT) type as input, where as your input is a delimited string, so I don't think explode would work in your case anyway.

huangapple
  • 本文由 发表于 2023年7月13日 20:03:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76679169.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定