在PySpark中对行进行透视而不进行聚合。

huangapple go评论62阅读模式
英文:

pyspark pivot row without aggrefation

问题

I have a Pyspark Dataframe named df as shown below,

我有一个名为df的Pyspark Dataframe,如下所示,

I need to pivot the data based on ProducingMonth and classification column and need to produce the following output

我需要根据ProducingMonth和classification列对数据进行透视,并需要生成以下输出

I am using the following pyspark code

我正在使用以下Pyspark代码

pivotDF = df.groupBy("WELL_ID","CLASSIFICATION").pivot("CLASSIFICATION")

while I am displaying the data I am getting error "'GroupedData' object has no attribute 'display'"

当我显示数据时,出现错误"'GroupedData'对象没有属性'display'"

英文:

I have Pyspark Dataframe named df as below,

在PySpark中对行进行透视而不进行聚合。

I need to pivot the data based on ProducingMonth and classification column and need to produce the following output

在PySpark中对行进行透视而不进行聚合。

I am using the following pyspark code

pivotDF = df.groupBy("WELL_ID","CLASSIFICATION").pivot("CLASSIFICATION")

while I am displaying the data I am getting error "'GroupedData' object has no attribute 'display'"

答案1

得分: 0

你需要在之后执行聚合。

from pyspark.sql import functions as F

pivotDF = df.groupBy("WELL_ID", "producing_month").pivot("CLASSIFICATION").agg(
   F.first("OIL"),
   F.first("GAS"),
)

然后你可能可以使用 pivotDF.display() 显示它。

英文:

You need to perform the aggregation after.

from pyspark.sql import functions as F

pivotDF = df.groupBy("WELL_ID","producing_month").pivot("CLASSIFICATION").agg(
   F.first("OIL"),
   F.first("GAS"),
)

Then you can probably use display pivotDF.display()

huangapple
  • 本文由 发表于 2023年5月25日 17:16:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76330671.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定