2023年7月27日 22:44:11go评论76阅读模式

英文:

Unpivot odd no of columns in Pyspark dataframe in databricks

问题

我有69列需要进行解封逆透视，我尝试了这种代码：

from pyspark.sql.functions import expr
group = Inv_df.groupBy('Project', 'Project Description')

# 在stack函数中使用列名和值的成对表达式

unpivotExpr = "stack(69, '2015-04-01', 2015-04-01, '2015-05-01', 2015-05-01, '2015-06-01', 2015-06-01, '2015-07-01', 2015-07-01, '2015-08-01', 2015-08-01, '2015-09-01', 2015-09-01, '2015-10-01', 2015-10-01, '2015-11-01', 2015-11-01, '2015-12-01', 2015-12-01, '2016-01-01', 2016-01-01, '2016-02-01', 2016-02-01, '2016-03-01', 2016-03-01, '2016-04-01', 2016-04-01, '2016-05-01', 2016-05-01, '2016-06-01', 2016-06-01, '2016-07-01', 2016-07-01, '2016-08-01', 2016-08-01, '2016-09-01', 2016-09-01, '2016-10-01', 2016-10-01, '2016-11-01', 2016-11-01, '2016-12-01', 2016-12-01, '2017-01-01', 2017-01-01, '2017-02-01', 2017-02-01, '2017-03-01', 2017-03-01, '2017-04-01', 2017-04-01, '2017-05-01', 2017-05-01, '2017-06-01', 2017-06-01, '2017-07-01', 2017-07-01, '2017-08-01', 2017-08-01, '2017-09-01', 2017-09-01, '2017-10-01', 2017-10-01, '2017-11-01', 2017-11-01, '2017-12-01', 2017-12-01, '2018-01-01', 2018-01-01, '2018-02-01', 2018-02-01, '2018-03-01', 2018-03-01, '2018-04-01', 2018-04-01, '2018-05-01', 2018-05-01, '2018-06-01', 2018-06-01, '2018-07-01', 2018-07-01, '2018-08-01', 2018-08-01, '2018-09-01', 2018-09-01, '2018-10-01', 2018-10-01, '2018-11-01', 2018-11-01, '2018-12-01', 2018-12-01, '2019-01-01', 2019-01-01, '2019-02-01', 2019-02-01, '2019-03-01', 2019-03-01, '2019-04-01', 2019-04-01, '2019-05-01', 2019-05-01, '2019-06-01', 2019-06-01, '2019-07-01', 2019-07-01, '2019-08-01', 2019-08-01, '2019-09-01', 2019-09-01, '2019-10-01', 2019-10-01, '2019-11-01', 2019-11-01, '2019-12-01', 2019-12-01, '2020-01-01', 2020-01-01, '2020-02-01', 2020-02-01, '2020-03-01', 2020-03-01, '2020-04-01', 2020-04-01, '2020-05-01', 2020-05-01, '2020-06-01', 2020-06-01, '2020-07-01', 2020-07-01, '2020-08-01', 2020-08-01, '2020-09-01', 2020-09-01, '2020-10-01', 2020-10-01, '2020-11-01', 2020-11-01, '2020-12-01', 2020-12-01 ) as (Name, value)"

unPivotDF = group.agg(expr(unpivotExpr))

它给我了错误的结果，这种方法只适用于偶数列吗？

请建议我正确的解封方法。

英文:

I have 69 cols which are to be unpivoted, I tried this kind of code :

from pyspark.sql.functions import expr
group = Inv_df.groupBy(&#39;Project&#39;, &#39;Project Description&#39;)

# Use pairs of column name and value in the stack function

unpivotExpr = &quot;stack(69, &#39;2015-04-01&#39;, 2015-04-01, &#39;2015-05-01&#39;, 2015-05-01, &#39;2015-06-01&#39;, 2015-06-01, &#39;2015-07-01&#39;, 2015-07-01, &#39;2015-08-01&#39;, 2015-08-01, &#39;2015-09-01&#39;, 2015-09-01, &#39;2015-10-01&#39;, 2015-10-01, &#39;2015-11-01&#39;, 2015-11-01, &#39;2015-12-01&#39;, 2015-12-01, &#39;2016-01-01&#39;, 2016-01-01, &#39;2016-02-01&#39;, 2016-02-01, &#39;2016-03-01&#39;, 2016-03-01, &#39;2016-04-01&#39;, 2016-04-01, &#39;2016-05-01&#39;, 2016-05-01, &#39;2016-06-01&#39;, 2016-06-01, &#39;2016-07-01&#39;, 2016-07-01, &#39;2016-08-01&#39;, 2016-08-01, &#39;2016-09-01&#39;, 2016-09-01, &#39;2016-10-01&#39;, 2016-10-01, &#39;2016-11-01&#39;, 2016-11-01, &#39;2016-12-01&#39;, 2016-12-01, &#39;2017-01-01&#39;, 2017-01-01, &#39;2017-02-01&#39;, 2017-02-01, &#39;2017-03-01&#39;, 2017-03-01, &#39;2017-04-01&#39;, 2017-04-01, &#39;2017-05-01&#39;, 2017-05-01, &#39;2017-06-01&#39;, 2017-06-01, &#39;2017-07-01&#39;, 2017-07-01, &#39;2017-08-01&#39;, 2017-08-01, &#39;2017-09-01&#39;, 2017-09-01, &#39;2017-10-01&#39;, 2017-10-01, &#39;2017-11-01&#39;, 2017-11-01, &#39;2017-12-01&#39;, 2017-12-01, &#39;2018-01-01&#39;, 2018-01-01, &#39;2018-02-01&#39;, 2018-02-01, &#39;2018-03-01&#39;, 2018-03-01, &#39;2018-04-01&#39;, 2018-04-01, &#39;2018-05-01&#39;, 2018-05-01, &#39;2018-06-01&#39;, 2018-06-01, &#39;2018-07-01&#39;, 2018-07-01, &#39;2018-08-01&#39;, 2018-08-01, &#39;2018-09-01&#39;, 2018-09-01, &#39;2018-10-01&#39;, 2018-10-01, &#39;2018-11-01&#39;, 2018-11-01, &#39;2018-12-01&#39;, 2018-12-01, &#39;2019-01-01&#39;, 2019-01-01, &#39;2019-02-01&#39;, 2019-02-01, &#39;2019-03-01&#39;, 2019-03-01, &#39;2019-04-01&#39;, 2019-04-01, &#39;2019-05-01&#39;, 2019-05-01, &#39;2019-06-01&#39;, 2019-06-01, &#39;2019-07-01&#39;, 2019-07-01, &#39;2019-08-01&#39;, 2019-08-01, &#39;2019-09-01&#39;, 2019-09-01, &#39;2019-10-01&#39;, 2019-10-01, &#39;2019-11-01&#39;, 2019-11-01, &#39;2019-12-01&#39;, 2019-12-01, &#39;2020-01-01&#39;, 2020-01-01, &#39;2020-02-01&#39;, 2020-02-01, &#39;2020-03-01&#39;, 2020-03-01, &#39;2020-04-01&#39;, 2020-04-01, &#39;2020-05-01&#39;, 2020-05-01, &#39;2020-06-01&#39;, 2020-06-01, &#39;2020-07-01&#39;, 2020-07-01, &#39;2020-08-01&#39;, 2020-08-01, &#39;2020-09-01&#39;, 2020-09-01, &#39;2020-10-01&#39;, 2020-10-01, &#39;2020-11-01&#39;, 2020-11-01, &#39;2020-12-01&#39;, 2020-12-01 ) as (Name, value)&quot;

unPivotDF = group.agg(expr(unpivotExpr))

It gave me wrong results, is this method is only for even no of cols.

Please suggest me the right way to unpivot.

答案1

得分: 0

你可以使用以下方式使用unpivot函数。

display(df.unpivot(["Project", "Project Description"],["2015-04-01", "2015-05-01", "2015-06-01"],"Name", "value"))

或者，如果需要指定适当的列名，因为你使用的表达式接受了文字值。

stack(69,'2015-04-01', 2015-04-01,....)
在这个表达式中它将值解释为`((2015-04)-01)`，这将给你2010。

所以，将你的列名更改为不在expr函数中进行评估的方式。
我将列名从2015-04-01更改为2015_04_01。

并且获得了成功的输出。

unpivotExpr = "stack(3, '2015_04_01',any_value(2015_04_01), '2015_05_01',any_value(2015_05_01),'2015_06_01',any_value(2015_06_01)) as (Name,value)"
display(df.groupBy('Project', 'Project Description').agg(expr(unpivotExpr)))

或者不使用分组。

unpivotExpr = "stack(3, '2015_04_01',2015_04_01, '2015_05_01',2015_05_01,'2015_06_01',2015_06_01) as (Name,value)"
display(df.select("Project", "Project Description",expr(unpivotExpr)))

如果你不能够重命名所有的69个列，请使用第一种方法。

英文:

You can use unpivot function as below.

display(df.unpivot([&quot;Project&quot;, &quot;Project Description&quot;],[&quot;2015-04-01&quot;, &quot;2015-05-01&quot;, &quot;2015-06-01&quot;],&quot;Name&quot;, &quot;value&quot;))

在Databricks中使用Pyspark dataframe进行奇数列的”Unpivot”操作。

Or
else you need to specify proper column name because,
the expression you used takes literal value.

   stack(69,&#39;2015-04-01&#39;, 2015-04-01,....)

In this expression it takes value as ((2015-04)-01) gives you 2010.

在Databricks中使用Pyspark dataframe进行奇数列的”Unpivot”操作。

So, change your column names in such a way that doesn't evaluate in expr function.
I changed column names from 2015-04-01 to 2015_04_01.

And got successful output.

unpivotExpr = &quot;stack(3, &#39;2015_04_01&#39;,any_value(2015_04_01), &#39;2015_05_01&#39;,any_value(2015_05_01),&#39;2015_06_01&#39;,any_value(2015_06_01)) as (Name,value)&quot;
display(df.groupBy(&#39;Project&#39;, &#39;Project Description&#39;).agg(expr(unpivotExpr)))

在Databricks中使用Pyspark dataframe进行奇数列的”Unpivot”操作。

Or without group by.

unpivotExpr = &quot;stack(3, &#39;2015_04_01&#39;,2015_04_01, &#39;2015_05_01&#39;,2015_05_01,&#39;2015_06_01&#39;,2015_06_01) as (Name,value)&quot;
display(df.select(&quot;Project&quot;, &quot;Project Description&quot;,expr(unpivotExpr)))

在Databricks中使用Pyspark dataframe进行奇数列的”Unpivot”操作。

If you can't able to rename all 69 columns use first approach.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Databricks中使用Pyspark dataframe进行奇数列的”Unpivot”操作。

问题

答案1

如何在显示数据时避免换行。

尝试在Databricks SQL中将字符串转换为日期列。

ModuleNotFoundError: 找不到模块名为 ‘pyspark.streaming.kafka’

Pyspark Dataframe 电话号码格式化

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论