英文:
Unpivot odd no of columns in Pyspark dataframe in databricks
问题
我有69列需要进行解封逆透视,我尝试了这种代码:
from pyspark.sql.functions import expr
group = Inv_df.groupBy('Project', 'Project Description')
# 在stack函数中使用列名和值的成对表达式
unpivotExpr = "stack(69, '2015-04-01', 2015-04-01, '2015-05-01', 2015-05-01, '2015-06-01', 2015-06-01, '2015-07-01', 2015-07-01, '2015-08-01', 2015-08-01, '2015-09-01', 2015-09-01, '2015-10-01', 2015-10-01, '2015-11-01', 2015-11-01, '2015-12-01', 2015-12-01, '2016-01-01', 2016-01-01, '2016-02-01', 2016-02-01, '2016-03-01', 2016-03-01, '2016-04-01', 2016-04-01, '2016-05-01', 2016-05-01, '2016-06-01', 2016-06-01, '2016-07-01', 2016-07-01, '2016-08-01', 2016-08-01, '2016-09-01', 2016-09-01, '2016-10-01', 2016-10-01, '2016-11-01', 2016-11-01, '2016-12-01', 2016-12-01, '2017-01-01', 2017-01-01, '2017-02-01', 2017-02-01, '2017-03-01', 2017-03-01, '2017-04-01', 2017-04-01, '2017-05-01', 2017-05-01, '2017-06-01', 2017-06-01, '2017-07-01', 2017-07-01, '2017-08-01', 2017-08-01, '2017-09-01', 2017-09-01, '2017-10-01', 2017-10-01, '2017-11-01', 2017-11-01, '2017-12-01', 2017-12-01, '2018-01-01', 2018-01-01, '2018-02-01', 2018-02-01, '2018-03-01', 2018-03-01, '2018-04-01', 2018-04-01, '2018-05-01', 2018-05-01, '2018-06-01', 2018-06-01, '2018-07-01', 2018-07-01, '2018-08-01', 2018-08-01, '2018-09-01', 2018-09-01, '2018-10-01', 2018-10-01, '2018-11-01', 2018-11-01, '2018-12-01', 2018-12-01, '2019-01-01', 2019-01-01, '2019-02-01', 2019-02-01, '2019-03-01', 2019-03-01, '2019-04-01', 2019-04-01, '2019-05-01', 2019-05-01, '2019-06-01', 2019-06-01, '2019-07-01', 2019-07-01, '2019-08-01', 2019-08-01, '2019-09-01', 2019-09-01, '2019-10-01', 2019-10-01, '2019-11-01', 2019-11-01, '2019-12-01', 2019-12-01, '2020-01-01', 2020-01-01, '2020-02-01', 2020-02-01, '2020-03-01', 2020-03-01, '2020-04-01', 2020-04-01, '2020-05-01', 2020-05-01, '2020-06-01', 2020-06-01, '2020-07-01', 2020-07-01, '2020-08-01', 2020-08-01, '2020-09-01', 2020-09-01, '2020-10-01', 2020-10-01, '2020-11-01', 2020-11-01, '2020-12-01', 2020-12-01 ) as (Name, value)"
unPivotDF = group.agg(expr(unpivotExpr))
它给我了错误的结果,这种方法只适用于偶数列吗?
请建议我正确的解封方法。
英文:
I have 69 cols which are to be unpivoted, I tried this kind of code :
from pyspark.sql.functions import expr
group = Inv_df.groupBy('Project', 'Project Description')
# Use pairs of column name and value in the stack function
unpivotExpr = "stack(69, '2015-04-01', 2015-04-01, '2015-05-01', 2015-05-01, '2015-06-01', 2015-06-01, '2015-07-01', 2015-07-01, '2015-08-01', 2015-08-01, '2015-09-01', 2015-09-01, '2015-10-01', 2015-10-01, '2015-11-01', 2015-11-01, '2015-12-01', 2015-12-01, '2016-01-01', 2016-01-01, '2016-02-01', 2016-02-01, '2016-03-01', 2016-03-01, '2016-04-01', 2016-04-01, '2016-05-01', 2016-05-01, '2016-06-01', 2016-06-01, '2016-07-01', 2016-07-01, '2016-08-01', 2016-08-01, '2016-09-01', 2016-09-01, '2016-10-01', 2016-10-01, '2016-11-01', 2016-11-01, '2016-12-01', 2016-12-01, '2017-01-01', 2017-01-01, '2017-02-01', 2017-02-01, '2017-03-01', 2017-03-01, '2017-04-01', 2017-04-01, '2017-05-01', 2017-05-01, '2017-06-01', 2017-06-01, '2017-07-01', 2017-07-01, '2017-08-01', 2017-08-01, '2017-09-01', 2017-09-01, '2017-10-01', 2017-10-01, '2017-11-01', 2017-11-01, '2017-12-01', 2017-12-01, '2018-01-01', 2018-01-01, '2018-02-01', 2018-02-01, '2018-03-01', 2018-03-01, '2018-04-01', 2018-04-01, '2018-05-01', 2018-05-01, '2018-06-01', 2018-06-01, '2018-07-01', 2018-07-01, '2018-08-01', 2018-08-01, '2018-09-01', 2018-09-01, '2018-10-01', 2018-10-01, '2018-11-01', 2018-11-01, '2018-12-01', 2018-12-01, '2019-01-01', 2019-01-01, '2019-02-01', 2019-02-01, '2019-03-01', 2019-03-01, '2019-04-01', 2019-04-01, '2019-05-01', 2019-05-01, '2019-06-01', 2019-06-01, '2019-07-01', 2019-07-01, '2019-08-01', 2019-08-01, '2019-09-01', 2019-09-01, '2019-10-01', 2019-10-01, '2019-11-01', 2019-11-01, '2019-12-01', 2019-12-01, '2020-01-01', 2020-01-01, '2020-02-01', 2020-02-01, '2020-03-01', 2020-03-01, '2020-04-01', 2020-04-01, '2020-05-01', 2020-05-01, '2020-06-01', 2020-06-01, '2020-07-01', 2020-07-01, '2020-08-01', 2020-08-01, '2020-09-01', 2020-09-01, '2020-10-01', 2020-10-01, '2020-11-01', 2020-11-01, '2020-12-01', 2020-12-01 ) as (Name, value)"
unPivotDF = group.agg(expr(unpivotExpr))
It gave me wrong results, is this method is only for even no of cols.
Please suggest me the right way to unpivot.
答案1
得分: 0
你可以使用以下方式使用unpivot
函数。
display(df.unpivot(["Project", "Project Description"],["2015-04-01", "2015-05-01", "2015-06-01"],"Name", "value"))
或者,如果需要指定适当的列名,因为你使用的表达式接受了文字值。
stack(69,'2015-04-01', 2015-04-01,....)
在这个表达式中它将值解释为`((2015-04)-01)`,这将给你2010。
所以,将你的列名更改为不在expr
函数中进行评估的方式。
我将列名从2015-04-01
更改为2015_04_01
。
并且获得了成功的输出。
unpivotExpr = "stack(3, '2015_04_01',any_value(2015_04_01), '2015_05_01',any_value(2015_05_01),'2015_06_01',any_value(2015_06_01)) as (Name,value)"
display(df.groupBy('Project', 'Project Description').agg(expr(unpivotExpr)))
或者不使用分组。
unpivotExpr = "stack(3, '2015_04_01',2015_04_01, '2015_05_01',2015_05_01,'2015_06_01',2015_06_01) as (Name,value)"
display(df.select("Project", "Project Description",expr(unpivotExpr)))
如果你不能够重命名所有的69个列,请使用第一种方法。
英文:
You can use unpivot
function as below.
display(df.unpivot(["Project", "Project Description"],["2015-04-01", "2015-05-01", "2015-06-01"],"Name", "value"))
Or
else you need to specify proper column name because,
the expression you used takes literal value.
stack(69,'2015-04-01', 2015-04-01,....)
In this expression it takes value as ((2015-04)-01)
gives you 2010.
So, change your column names in such a way that doesn't evaluate in expr
function.
I changed column names from 2015-04-01
to 2015_04_01
.
And got successful output.
unpivotExpr = "stack(3, '2015_04_01',any_value(2015_04_01), '2015_05_01',any_value(2015_05_01),'2015_06_01',any_value(2015_06_01)) as (Name,value)"
display(df.groupBy('Project', 'Project Description').agg(expr(unpivotExpr)))
Or without group by.
unpivotExpr = "stack(3, '2015_04_01',2015_04_01, '2015_05_01',2015_05_01,'2015_06_01',2015_06_01) as (Name,value)"
display(df.select("Project", "Project Description",expr(unpivotExpr)))
If you can't able to rename all 69 columns use first approach.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论