英文:
How to create mapping of dataframe columns with new column names
问题
col_map = {"name": "new_name", "age": "new_age"}
英文:
I want to create a column mapping for dataframe columns, this mapping depending on the dataframe schema.
eg.
df = spark.createDataFrame([("Alice", 2), ("Bob", 5)], ("name", "age"))
df.show()
+-----+---+
| name|age|
+-----+---+
|Alice| 2|
| Bob| 5|
+-----+---+
I want to create map with renamed column. I want to use this map to extract the renamed names in the further code.(I have to create mapping because my schema is dynamic.)
col_map={"name":"new_name","age":"new_age"}
when ever I df in my further code I should be always able to read columns new_name and new_age etc
答案1
得分: 2
我们可以使用 .alias()
来动态更改数据框列名,基于 map
。
示例:
from pyspark.sql.functions import *
df = spark.createDataFrame([("Alice", 2), ("Bob", 5)], ("name", "age"))
df.show(10, False)
col_map={"name":"new_name","age":"new_age"}
df.select([col(f).alias(col_map[f'{f}']) for f in df.columns]).show(10, False)
输出:
+-----+---+
|name |age|
+-----+---+
|Alice|2 |
|Bob |5 |
+-----+---+
+--------+-------+
|new_name|new_age|
+--------+-------+
|Alice |2 |
|Bob |5 |
+--------+-------+
英文:
We can use .alias()
for dynamically change the dataframe column names based on the map
.
Example:
from pyspark.sql.functions import *
df = spark.createDataFrame([("Alice", 2), ("Bob", 5)], ("name", "age"))
df.show(10,False)
col_map={"name":"new_name","age":"new_age"}
df.select([col(f).alias(col_map[f'{f}']) for f in df.columns]).show(10,False)
#
#+-----+---+
#|name |age|
#+-----+---+
#|Alice|2 |
#|Bob |5 |
#+-----+---+
#
#+--------+-------+
#|new_name|new_age|
#+--------+-------+
#|Alice |2 |
#|Bob |5 |
#+--------+-------+
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论