英文:
pyspark pivot row without aggrefation
问题
I have a Pyspark Dataframe named df as shown below,
我有一个名为df的Pyspark Dataframe,如下所示,
I need to pivot the data based on ProducingMonth and classification column and need to produce the following output
我需要根据ProducingMonth和classification列对数据进行透视,并需要生成以下输出
I am using the following pyspark code
我正在使用以下Pyspark代码
pivotDF = df.groupBy("WELL_ID","CLASSIFICATION").pivot("CLASSIFICATION")
while I am displaying the data I am getting error "'GroupedData' object has no attribute 'display'"
当我显示数据时,出现错误"'GroupedData'对象没有属性'display'"
英文:
I have Pyspark Dataframe named df as below,
I need to pivot the data based on ProducingMonth and classification column and need to produce the following output
I am using the following pyspark code
pivotDF = df.groupBy("WELL_ID","CLASSIFICATION").pivot("CLASSIFICATION")
while I am displaying the data I am getting error "'GroupedData' object has no attribute 'display'"
答案1
得分: 0
你需要在之后执行聚合。
from pyspark.sql import functions as F
pivotDF = df.groupBy("WELL_ID", "producing_month").pivot("CLASSIFICATION").agg(
F.first("OIL"),
F.first("GAS"),
)
然后你可能可以使用 pivotDF.display()
显示它。
英文:
You need to perform the aggregation after.
from pyspark.sql import functions as F
pivotDF = df.groupBy("WELL_ID","producing_month").pivot("CLASSIFICATION").agg(
F.first("OIL"),
F.first("GAS"),
)
Then you can probably use display pivotDF.display()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论