2023年7月13日 20:03:23go评论98阅读模式

英文:

How to implement explode functionality in Snowpark(Python) dataframe without using the explode() function?

问题

我正在使用Snowpark（Python）工作，并且正在使用snowflake-snowpark-python库的版本1.0.0。我有一个版本约束，因此不能升级。在这个版本中，不支持explode()函数。

如何在不使用explode()函数的情况下使用Snowpark Dataframe来实现相同的功能？我只有一个需要展开的列。
我的数据框（df）看起来像这样：

df = session.create_dataframe(["rest_of_the_row", ["A|B|C"]], schema=["record", "product_code"])

我的期望输出是：

+----------------+------------+
|    record      |product_code|
+----------------+------------+
| rest_of_the_row| A          |
| rest_of_the_row| B          |
| rest_of_the_row| C          |
+----------------+------------+

我还会根据product_code进行进一步的聚合。请建议如何使用我自己的explode()实现这个目标？

英文:

I am working in Snowpark (Python), and I am using snowflake-snowpark-python library version 1.0.0. I have a version constraint, and hence cant upgrade. In this version, the function explode() is not supported.

How can I implement the same functionality using Snowpark Dataframe, without actually using explode() function? I have only one column that needs to be exploded.
My df looks like this:
df = session.create_dataframe(["rest_of_the_row",["A|B|C"]],schema=["record",product_code"])

My expected output is:

+----------------+------------+ 
|    record      |product_code|
+----------------+------------+
| rest_of_the_row| A          |
| rest_of_the_row| B          |
| rest_of_the_row| C          |
+----------------+------------+

I am further doing aggregations based on product_code.
Please suggest, how can I achieve this with my own implementation of explode() ?

答案1

得分: 0

我无法使您的 df 语句工作，所以您可能需要纠正它。假设这是您的数据样式：

df = session.create_dataframe([{"record": "rest_of_the_row", "product_code": "A|B|C"}])
df.show()
------------------------------------
|"RECORD"         |"PRODUCT_CODE"  |
------------------------------------
|rest_of_the_row  |A|B|C           |
------------------------------------

然后您可以使用 split_to_table 来处理：

from snowflake.snowpark import functions as F
df.join_table_function("split_to_table", df['product_code'], F.lit("|")) \
  .select(F.col("RECORD"), F.col("VALUE").alias("PRODUCT_CODE")) \
  .show()

explode 接受数组或映射（对象）类型的输入，而您的输入是一个包含分隔符的字符串，所以我认为 explode 在您的情况下不会起作用。

英文:

I couldn't get your df statement to work so you may want to correct it. Assuming this is what your data looks like:

df = session.create_dataframe([{&quot;record&quot;: &quot;rest_of_the_row&quot;,&quot;product_code&quot;: &quot;A|B|C&quot;}])
df.show()
------------------------------------
|&quot;RECORD&quot;         |&quot;PRODUCT_CODE&quot;  |
------------------------------------
|rest_of_the_row  |A|B|C           |
------------------------------------

Then you can use split_to_table for this:

from snowflake.snowpark import functions as F
df.join_table_function(&quot;split_to_table&quot;, df[&#39;product_code&#39;], F.lit(&quot;|&quot;)) \
  .select(F.col(&quot;RECORD&quot;), F.col(&quot;VALUE&quot;).alias(&quot;PRODUCT_CODE&quot;)) \
  .show()

Explode takes an ARRAY or MAP (OBJECT) type as input, where as your input is a delimited string, so I don't think explode would work in your case anyway.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Snowpark（Python）数据框中实现explode功能，而不使用explode()函数？

问题

答案1

在迭代列以进行API请求时出现类型错误。

添加/链接一个Docker PostgreSQL数据库容器到我的现有Docker Python容器。

VSCode：FileNotFoundError：[Errno 2] 没有这个文件或目录

Python matplotlib stepped axis label

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。