英文:
How to pivot 2 columns in PySpark
问题
这是你需要的结果:
id | var3 | var4 |
---|---|---|
465 | var1 | 1000 |
465 | var2 | 200 |
455 | var1 | 2000 |
455 | var2 | 400 |
英文:
I have a dataframe as below and I need to pivot it such that two new columns are created with the values "var1" and "var2" taken from the column headers as "var3" and then with the amount associated with each as "var4", grouped by id. There are other columns in the dataframe I am working with, but they are all at the same level of id.
id | var1 | var2 |
---|---|---|
465 | 1000 | 200 |
455 | 2000 | 400 |
The resulting output would be:
id | var3 | var4 |
---|---|---|
465 | var1 | 1000 |
465 | var2 | 200 |
455 | var1 | 2000 |
455 | var2 | 400 |
答案1
得分: 1
使用 unpivot:
df.unpivot(['id'], ['var1', 'var2'], 'var3', 'var4').show()
或者使用 stack:
df.selectExpr("id", "stack(2, 'var1', var1, 'var2', var2) as (var3, var4)").show()
或者使用 melt:
df.melt(ids=['id'], values=['var1', 'var2'], variableColumnName="var3", valueColumnName="var4").show()
输入:
输出:
英文:
Use unpivot:
df.unpivot(['id'], ['var1', 'var2'], 'var3', 'var4').show()
Or stack:
df.selectExpr("id", "stack(2, 'var1', var1, 'var2', var2) as (var3, var4)").show()
Or melt:
df.melt(ids=['id'], values=['var1', 'var2'],variableColumnName="var3",valueColumnName="var4").show()
Input:
Output:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论