英文:
How to create mapping of dataframe columns with new column names
问题
col_map = {"name": "new_name", "age": "new_age"}
英文:
I want to create a column mapping for dataframe columns, this mapping depending on the dataframe schema.
eg.
df = spark.createDataFrame([("Alice", 2), ("Bob", 5)], ("name", "age"))
df.show()
+-----+---+
| name|age|
+-----+---+
|Alice|  2|
|  Bob|  5|
+-----+---+
I want to create map with renamed column. I want to use this map to extract the renamed names in the further code.(I have to create mapping because my schema is dynamic.)
col_map={"name":"new_name","age":"new_age"}
when ever I df in my further code I should be always able to read columns new_name and new_age etc
答案1
得分: 2
我们可以使用 .alias() 来动态更改数据框列名,基于 map。
示例:
from pyspark.sql.functions import *
df = spark.createDataFrame([("Alice", 2), ("Bob", 5)], ("name", "age"))
df.show(10, False)
col_map={"name":"new_name","age":"new_age"}
df.select([col(f).alias(col_map[f'{f}']) for f in df.columns]).show(10, False)
输出:
+-----+---+
|name |age|
+-----+---+
|Alice|2  |
|Bob  |5  |
+-----+---+
+--------+-------+
|new_name|new_age|
+--------+-------+
|Alice   |2      |
|Bob     |5      |
+--------+-------+
英文:
We can use .alias() for dynamically change the dataframe column names based on the map.
Example:
from pyspark.sql.functions import *
df = spark.createDataFrame([("Alice", 2), ("Bob", 5)], ("name", "age"))
df.show(10,False)
col_map={"name":"new_name","age":"new_age"}
df.select([col(f).alias(col_map[f'{f}']) for f in df.columns]).show(10,False)
#
#+-----+---+
#|name |age|
#+-----+---+
#|Alice|2  |
#|Bob  |5  |
#+-----+---+
#
#+--------+-------+
#|new_name|new_age|
#+--------+-------+
#|Alice   |2      |
#|Bob     |5      |
#+--------+-------+
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论