在Databricks中使用Pyspark dataframe进行奇数列的”Unpivot”操作。

huangapple go评论73阅读模式
英文:

Unpivot odd no of columns in Pyspark dataframe in databricks

问题

我有69列需要进行解封逆透视,我尝试了这种代码:

from pyspark.sql.functions import expr
group = Inv_df.groupBy('Project', 'Project Description')

# 在stack函数中使用列名和值的成对表达式

unpivotExpr = "stack(69, '2015-04-01', 2015-04-01, '2015-05-01', 2015-05-01, '2015-06-01', 2015-06-01, '2015-07-01', 2015-07-01, '2015-08-01', 2015-08-01, '2015-09-01', 2015-09-01, '2015-10-01', 2015-10-01, '2015-11-01', 2015-11-01, '2015-12-01', 2015-12-01, '2016-01-01', 2016-01-01, '2016-02-01', 2016-02-01, '2016-03-01', 2016-03-01, '2016-04-01', 2016-04-01, '2016-05-01', 2016-05-01, '2016-06-01', 2016-06-01, '2016-07-01', 2016-07-01, '2016-08-01', 2016-08-01, '2016-09-01', 2016-09-01, '2016-10-01', 2016-10-01, '2016-11-01', 2016-11-01, '2016-12-01', 2016-12-01, '2017-01-01', 2017-01-01, '2017-02-01', 2017-02-01, '2017-03-01', 2017-03-01, '2017-04-01', 2017-04-01, '2017-05-01', 2017-05-01, '2017-06-01', 2017-06-01, '2017-07-01', 2017-07-01, '2017-08-01', 2017-08-01, '2017-09-01', 2017-09-01, '2017-10-01', 2017-10-01, '2017-11-01', 2017-11-01, '2017-12-01', 2017-12-01, '2018-01-01', 2018-01-01, '2018-02-01', 2018-02-01, '2018-03-01', 2018-03-01, '2018-04-01', 2018-04-01, '2018-05-01', 2018-05-01, '2018-06-01', 2018-06-01, '2018-07-01', 2018-07-01, '2018-08-01', 2018-08-01, '2018-09-01', 2018-09-01, '2018-10-01', 2018-10-01, '2018-11-01', 2018-11-01, '2018-12-01', 2018-12-01, '2019-01-01', 2019-01-01, '2019-02-01', 2019-02-01, '2019-03-01', 2019-03-01, '2019-04-01', 2019-04-01, '2019-05-01', 2019-05-01, '2019-06-01', 2019-06-01, '2019-07-01', 2019-07-01, '2019-08-01', 2019-08-01, '2019-09-01', 2019-09-01, '2019-10-01', 2019-10-01, '2019-11-01', 2019-11-01, '2019-12-01', 2019-12-01, '2020-01-01', 2020-01-01, '2020-02-01', 2020-02-01, '2020-03-01', 2020-03-01, '2020-04-01', 2020-04-01, '2020-05-01', 2020-05-01, '2020-06-01', 2020-06-01, '2020-07-01', 2020-07-01, '2020-08-01', 2020-08-01, '2020-09-01', 2020-09-01, '2020-10-01', 2020-10-01, '2020-11-01', 2020-11-01, '2020-12-01', 2020-12-01 ) as (Name, value)"

unPivotDF = group.agg(expr(unpivotExpr))

它给我了错误的结果,这种方法只适用于偶数列吗?

请建议我正确的解封方法。

英文:

I have 69 cols which are to be unpivoted, I tried this kind of code :

from pyspark.sql.functions import expr
group = Inv_df.groupBy('Project', 'Project Description')

# Use pairs of column name and value in the stack function

unpivotExpr = "stack(69, '2015-04-01', 2015-04-01, '2015-05-01', 2015-05-01, '2015-06-01', 2015-06-01, '2015-07-01', 2015-07-01, '2015-08-01', 2015-08-01, '2015-09-01', 2015-09-01, '2015-10-01', 2015-10-01, '2015-11-01', 2015-11-01, '2015-12-01', 2015-12-01, '2016-01-01', 2016-01-01, '2016-02-01', 2016-02-01, '2016-03-01', 2016-03-01, '2016-04-01', 2016-04-01, '2016-05-01', 2016-05-01, '2016-06-01', 2016-06-01, '2016-07-01', 2016-07-01, '2016-08-01', 2016-08-01, '2016-09-01', 2016-09-01, '2016-10-01', 2016-10-01, '2016-11-01', 2016-11-01, '2016-12-01', 2016-12-01, '2017-01-01', 2017-01-01, '2017-02-01', 2017-02-01, '2017-03-01', 2017-03-01, '2017-04-01', 2017-04-01, '2017-05-01', 2017-05-01, '2017-06-01', 2017-06-01, '2017-07-01', 2017-07-01, '2017-08-01', 2017-08-01, '2017-09-01', 2017-09-01, '2017-10-01', 2017-10-01, '2017-11-01', 2017-11-01, '2017-12-01', 2017-12-01, '2018-01-01', 2018-01-01, '2018-02-01', 2018-02-01, '2018-03-01', 2018-03-01, '2018-04-01', 2018-04-01, '2018-05-01', 2018-05-01, '2018-06-01', 2018-06-01, '2018-07-01', 2018-07-01, '2018-08-01', 2018-08-01, '2018-09-01', 2018-09-01, '2018-10-01', 2018-10-01, '2018-11-01', 2018-11-01, '2018-12-01', 2018-12-01, '2019-01-01', 2019-01-01, '2019-02-01', 2019-02-01, '2019-03-01', 2019-03-01, '2019-04-01', 2019-04-01, '2019-05-01', 2019-05-01, '2019-06-01', 2019-06-01, '2019-07-01', 2019-07-01, '2019-08-01', 2019-08-01, '2019-09-01', 2019-09-01, '2019-10-01', 2019-10-01, '2019-11-01', 2019-11-01, '2019-12-01', 2019-12-01, '2020-01-01', 2020-01-01, '2020-02-01', 2020-02-01, '2020-03-01', 2020-03-01, '2020-04-01', 2020-04-01, '2020-05-01', 2020-05-01, '2020-06-01', 2020-06-01, '2020-07-01', 2020-07-01, '2020-08-01', 2020-08-01, '2020-09-01', 2020-09-01, '2020-10-01', 2020-10-01, '2020-11-01', 2020-11-01, '2020-12-01', 2020-12-01 ) as (Name, value)"

unPivotDF = group.agg(expr(unpivotExpr))

It gave me wrong results, is this method is only for even no of cols.

Please suggest me the right way to unpivot.

答案1

得分: 0

你可以使用以下方式使用unpivot函数。

display(df.unpivot(["Project", "Project Description"],["2015-04-01", "2015-05-01", "2015-06-01"],"Name", "value"))

或者,如果需要指定适当的列名,因为你使用的表达式接受了文字值。

stack(69,'2015-04-01', 2015-04-01,....)
在这个表达式中它将值解释为`((2015-04)-01)`,这将给你2010

所以,将你的列名更改为不在expr函数中进行评估的方式。
我将列名从2015-04-01更改为2015_04_01

并且获得了成功的输出。

unpivotExpr = "stack(3, '2015_04_01',any_value(2015_04_01), '2015_05_01',any_value(2015_05_01),'2015_06_01',any_value(2015_06_01)) as (Name,value)"
display(df.groupBy('Project', 'Project Description').agg(expr(unpivotExpr)))

或者不使用分组。

unpivotExpr = "stack(3, '2015_04_01',2015_04_01, '2015_05_01',2015_05_01,'2015_06_01',2015_06_01) as (Name,value)"
display(df.select("Project", "Project Description",expr(unpivotExpr)))

如果你不能够重命名所有的69个列,请使用第一种方法。

英文:

You can use unpivot function as below.

display(df.unpivot(["Project", "Project Description"],["2015-04-01", "2015-05-01", "2015-06-01"],"Name", "value"))

在Databricks中使用Pyspark dataframe进行奇数列的”Unpivot”操作。

Or
else you need to specify proper column name because,
the expression you used takes literal value.

   stack(69,'2015-04-01', 2015-04-01,....)

In this expression it takes value as ((2015-04)-01) gives you 2010.

在Databricks中使用Pyspark dataframe进行奇数列的”Unpivot”操作。

So, change your column names in such a way that doesn't evaluate in expr function.
I changed column names from 2015-04-01 to 2015_04_01.

And got successful output.

unpivotExpr = "stack(3, '2015_04_01',any_value(2015_04_01), '2015_05_01',any_value(2015_05_01),'2015_06_01',any_value(2015_06_01)) as (Name,value)"
display(df.groupBy('Project', 'Project Description').agg(expr(unpivotExpr)))

在Databricks中使用Pyspark dataframe进行奇数列的”Unpivot”操作。

Or without group by.

unpivotExpr = "stack(3, '2015_04_01',2015_04_01, '2015_05_01',2015_05_01,'2015_06_01',2015_06_01) as (Name,value)"
display(df.select("Project", "Project Description",expr(unpivotExpr)))

在Databricks中使用Pyspark dataframe进行奇数列的”Unpivot”操作。

If you can't able to rename all 69 columns use first approach.

huangapple
  • 本文由 发表于 2023年7月27日 22:44:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76780899.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定