2023年5月25日 12:39:57go评论164阅读模式

英文:

retrieve the non null values from a PySpark dataframe row and store this value in a new column

问题

I have a PySpark dataframe which has column names that are unique IDs generated by the UUID library. So I cannot query using column names. Each row in this PySpark dataframe has 1 "non-null value". How do I create a new column which only has this 1 non-null value? I have shared a sample below where "new_column" is the column I would like to create. Any help is appreciated. Thanks in advance.

我有一个PySpark数据帧，其中列名是由UUID库生成的唯一标识符。因此，我不能使用列名进行查询。这个PySpark数据帧的每一行都有1个“非空值”。如何创建一个新列，其中只包含这1个非空值？我在下面分享了一个示例，其中“new_column”是我想要创建的列。非常感谢您的帮助。

英文:

I have a PySpark dataframe which has column names which are unique_id's generated by UUID library. So I cannot query using column names. Each row in this pySpark dataframe has 1 "non null value". How do i create a new column which only has this 1 non null value? I have shared a sample below where "new_column" is the column I would like to create. Any help is appreciated. Thanks in advance

col1    col2      col3          col4             new_column    
Null    Null       xyz           Null                    xyz

I tried looking at dataframe operations but i was unable to find any relevant solution.

答案1

得分: 1

让我们在所有列上使用 coalesce。

df = df.withColumn('new_column', F.coalesce(*df.columns))

+----+----+----+----+----------+
|col1|col2|col3|col4|new_column|
+----+----+----+----+----------+
|null|null| xyz|null|       xyz|
| pqr|null|null|null|       pqr|
+----+----+----+----+----------+

英文:

Lets do coalesce on all columns

df = df.withColumn(&#39;new_column&#39;, F.coalesce(*df.columns))

+----+----+----+----+----------+
|col1|col2|col3|col4|new_column|
+----+----+----+----+----------+
|null|null| xyz|null|       xyz|
| pqr|null|null|null|       pqr|
+----+----+----+----+----------+

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从PySpark数据框的行中检索非空值，并将此值存储在新列中。

问题

答案1

将字典转换为数据框(DataFrame)的方法

多行和跨行的 pandas 表格行

如何创建一个自动设置编解码器（类型转换器）的 asyncpg 连接池？

从元组中获取除已知元素以外的值如何操作？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论