英文:
retrieve the non null values from a PySpark dataframe row and store this value in a new column
问题
I have a PySpark dataframe which has column names that are unique IDs generated by the UUID library. So I cannot query using column names. Each row in this PySpark dataframe has 1 "non-null value". How do I create a new column which only has this 1 non-null value? I have shared a sample below where "new_column" is the column I would like to create. Any help is appreciated. Thanks in advance.
我有一个PySpark数据帧,其中列名是由UUID库生成的唯一标识符。因此,我不能使用列名进行查询。这个PySpark数据帧的每一行都有1个“非空值”。如何创建一个新列,其中只包含这1个非空值?我在下面分享了一个示例,其中“new_column”是我想要创建的列。非常感谢您的帮助。
英文:
I have a PySpark dataframe which has column names which are unique_id's generated by UUID library. So I cannot query using column names. Each row in this pySpark dataframe has 1 "non null value". How do i create a new column which only has this 1 non null value? I have shared a sample below where "new_column" is the column I would like to create. Any help is appreciated. Thanks in advance
col1 col2 col3 col4 new_column
Null Null xyz Null xyz
I tried looking at dataframe operations but i was unable to find any relevant solution.
答案1
得分: 1
让我们在所有列上使用 coalesce
。
df = df.withColumn('new_column', F.coalesce(*df.columns))
+----+----+----+----+----------+
|col1|col2|col3|col4|new_column|
+----+----+----+----+----------+
|null|null| xyz|null| xyz|
| pqr|null|null|null| pqr|
+----+----+----+----+----------+
英文:
Lets do coalesce
on all columns
df = df.withColumn('new_column', F.coalesce(*df.columns))
+----+----+----+----+----------+
|col1|col2|col3|col4|new_column|
+----+----+----+----+----------+
|null|null| xyz|null| xyz|
| pqr|null|null|null| pqr|
+----+----+----+----+----------+
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论