如何在PySpark中旋转两列

huangapple go评论56阅读模式
英文:

How to pivot 2 columns in PySpark

问题

这是你需要的结果:

id var3 var4
465 var1 1000
465 var2 200
455 var1 2000
455 var2 400
英文:

I have a dataframe as below and I need to pivot it such that two new columns are created with the values "var1" and "var2" taken from the column headers as "var3" and then with the amount associated with each as "var4", grouped by id. There are other columns in the dataframe I am working with, but they are all at the same level of id.

id var1 var2
465 1000 200
455 2000 400

The resulting output would be:

id var3 var4
465 var1 1000
465 var2 200
455 var1 2000
455 var2 400

答案1

得分: 1

使用 unpivot

df.unpivot(['id'], ['var1', 'var2'], 'var3', 'var4').show()

或者使用 stack

df.selectExpr("id", "stack(2, 'var1', var1, 'var2', var2) as (var3, var4)").show()

或者使用 melt

df.melt(ids=['id'], values=['var1', 'var2'], variableColumnName="var3", valueColumnName="var4").show()

输入:

如何在PySpark中旋转两列

输出:

如何在PySpark中旋转两列

英文:

Use unpivot:

df.unpivot(['id'], ['var1', 'var2'], 'var3', 'var4').show()

Or stack:

df.selectExpr("id", "stack(2, 'var1', var1, 'var2', var2) as (var3, var4)").show()

Or melt:

df.melt(ids=['id'], values=['var1', 'var2'],variableColumnName="var3",valueColumnName="var4").show()

Input:

如何在PySpark中旋转两列

Output:

如何在PySpark中旋转两列

huangapple
  • 本文由 发表于 2023年2月8日 23:44:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/75388239.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定