Pyspark 使用多列创建数据透视表

huangapple go评论68阅读模式
英文:

Pyspark Pivot table with Multiple columns

问题

我试图在PySpark中对一个包含一个键和多个值的数据框进行透视。我之前使用了一个键值对进行了透视,并试图弄清楚如何实现。

示例数据框

id test_id test_status key score1 score2 score3
ABC 1 complete q1 1 2 3
ABC 1 complete q2 4 5 6
ABC 2 complete q1 1 6 7
ABC 2 complete q2 5 6 7

期望的数据框

id test_id test_status q1_score1 q1_score2 q1_score3 q2_score1 q2_score2 q2_score3
ABC 1 complete 1 2 3 4 5 6
ABC 2 complete 1 6 7 5 6 7
英文:

I am trying to pivot a dataframe with one key and multiple values across different columns . How do I do this in pyspark ? I have used pivot with one key value pair before and trying to figure this out.

Sample dataframe

id test_id test_status key score1 score2 score3
ABC 1 complete q1 1 2 3
ABC 1 complete q2 4 5 6
ABC 2 complete q1 1 6 7
ABC 2 complete q2 5 6 7

expected dataframe

id test_id test_status q1_score1 q1_score2 q1_score3 q2_score1 q2_score2 q2_score3
ABC 1 complete 1 2 3 4 5 6
ABC 2 complete 1 6 7 5 6 7

答案1

得分: 1

你可以执行多列数据透视。

df = (df.groupby('id', 'test_id', 'test_status')
      .pivot('key')
      .agg(*[F.first(x).alias(x) for x in ['score1', 'score2', 'score3']]))
英文:

You can do multiple columns pivot.

df = (df.groupby('id', 'test_id', 'test_status')
      .pivot('key')
      .agg(*[F.first(x).alias(x) for x in ['score1', 'score2', 'score3']]))

答案2

得分: 0

尝试使用**pivotfirst**聚合函数。

示例:

df = spark.createDataFrame(['ABC','1','c','q1','1','2','3'],['id','test_id','test_status','key','score1','score2','score3'])
df.show(10,False)
df.groupBy("id","test_id","test_status").pivot("key").agg(first(col("score1")).alias("score1"),first(col("score2")).alias("score2"),first(col("score3")).alias("score3")).show(10,False)

#输入
#+---+-------+-----------+---+------+------+------+
#|id |test_id|test_status|key|score1|score2|score3|
#+---+-------+-----------+---+------+------+------+
#|ABC|1 |c |q1 |1 |2 |3 |
#|ABC|1 |c |q2 |4 |5 |6 |
#+---+-------+-----------+---+------+------+------+

#+---+-------+-----------+---------+---------+---------+---------+---------+---------+
#|id |test_id|test_status|q1_score1|q1_score2|q1_score3|q2_score1|q2_score2|q2_score3|
#+---+-------+-----------+---------+---------+---------+---------+---------+---------+
#|ABC|1 |c |1 |2 |3 |4 |5 |6 |
#+---+-------+-----------+---------+---------+---------+---------+---------+---------+


<details>
<summary>英文:</summary>

Try with **`pivot`** and **`first`** aggregate function.

**`Example:`**

    df = spark.createDataFrame([(&#39;ABC&#39;,&#39;1&#39;,&#39;c&#39;,&#39;q1&#39;,&#39;1&#39;,&#39;2&#39;,&#39;3&#39;),(&#39;ABC&#39;,&#39;1&#39;,&#39;c&#39;,&#39;q2&#39;,&#39;4&#39;,&#39;5&#39;,&#39;6&#39;)],[&#39;id&#39;,&#39;test_id&#39;,&#39;test_status&#39;,&#39;key&#39;,&#39;score1&#39;,&#39;score2&#39;,&#39;score3&#39;])
    df.show(10,False)
    df.groupBy(&quot;id&quot;,&quot;test_id&quot;,&quot;test_status&quot;).pivot(&quot;key&quot;).agg(first(col(&quot;score1&quot;)).alias(&quot;score1&quot;),first(col(&quot;score2&quot;)).alias(&quot;score2&quot;),first(col(&quot;score3&quot;)).alias(&quot;score3&quot;)).show(10,False)
    #input
    #+---+-------+-----------+---+------+------+------+
    #|id |test_id|test_status|key|score1|score2|score3|
    #+---+-------+-----------+---+------+------+------+
    #|ABC|1      |c          |q1 |1     |2     |3     |
    #|ABC|1      |c          |q2 |4     |5     |6     |
    #+---+-------+-----------+---+------+------+------+
    
    #+---+-------+-----------+---------+---------+---------+---------+---------+---------+
    #|id |test_id|test_status|q1_score1|q1_score2|q1_score3|q2_score1|q2_score2|q2_score3|
    #+---+-------+-----------+---------+---------+---------+---------+---------+---------+
    #|ABC|1      |c          |1        |2        |3        |4        |5        |6        |
    #+---+-------+-----------+---------+---------+---------+---------+---------+---------+

</details>



huangapple
  • 本文由 发表于 2023年6月8日 01:59:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76425949.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定