2023年6月8日 01:59:40go评论98阅读模式

英文:

Pyspark Pivot table with Multiple columns

问题

我试图在PySpark中对一个包含一个键和多个值的数据框进行透视。我之前使用了一个键值对进行了透视，并试图弄清楚如何实现。

示例数据框

id	test_id	test_status	key	score1	score2	score3
ABC	1	complete	q1	1	2	3
ABC	1	complete	q2	4	5	6
ABC	2	complete	q1	1	6	7
ABC	2	complete	q2	5	6	7

期望的数据框

id	test_id	test_status	q1_score1	q1_score2	q1_score3	q2_score1	q2_score2	q2_score3
ABC	1	complete	1	2	3	4	5	6
ABC	2	complete	1	6	7	5	6	7

英文:

I am trying to pivot a dataframe with one key and multiple values across different columns . How do I do this in pyspark ? I have used pivot with one key value pair before and trying to figure this out.

Sample dataframe

id	test_id	test_status	key	score1	score2	score3
ABC	1	complete	q1	1	2	3
ABC	1	complete	q2	4	5	6
ABC	2	complete	q1	1	6	7
ABC	2	complete	q2	5	6	7

expected dataframe

id	test_id	test_status	q1_score1	q1_score2	q1_score3	q2_score1	q2_score2	q2_score3
ABC	1	complete	1	2	3	4	5	6
ABC	2	complete	1	6	7	5	6	7

答案1

得分: 1

你可以执行多列数据透视。

df = (df.groupby('id', 'test_id', 'test_status')
      .pivot('key')
      .agg(*[F.first(x).alias(x) for x in ['score1', 'score2', 'score3']]))

英文:

You can do multiple columns pivot.

df = (df.groupby(&#39;id&#39;, &#39;test_id&#39;, &#39;test_status&#39;)
      .pivot(&#39;key&#39;)
      .agg(*[F.first(x).alias(x) for x in [&#39;score1&#39;, &#39;score2&#39;, &#39;score3&#39;]]))

答案2

得分: 0

尝试使用**pivot和first**聚合函数。

示例：

df = spark.createDataFrame(['ABC','1','c','q1','1','2','3'],['id','test_id','test_status','key','score1','score2','score3'])
df.show(10,False)
df.groupBy("id","test_id","test_status").pivot("key").agg(first(col("score1")).alias("score1"),first(col("score2")).alias("score2"),first(col("score3")).alias("score3")).show(10,False)

#输入
#+---+-------+-----------+---+------+------+------+
#|id |test_id|test_status|key|score1|score2|score3|
#+---+-------+-----------+---+------+------+------+
#|ABC|1 |c |q1 |1 |2 |3 |
#|ABC|1 |c |q2 |4 |5 |6 |
#+---+-------+-----------+---+------+------+------+

#+---+-------+-----------+---------+---------+---------+---------+---------+---------+
#|id |test_id|test_status|q1_score1|q1_score2|q1_score3|q2_score1|q2_score2|q2_score3|
#+---+-------+-----------+---------+---------+---------+---------+---------+---------+
#|ABC|1 |c |1 |2 |3 |4 |5 |6 |
#+---+-------+-----------+---------+---------+---------+---------+---------+---------+


<details>
<summary>英文:</summary>
Try with **`pivot`** and **`first`** aggregate function.
**`Example:`**
    df = spark.createDataFrame([(&#39;ABC&#39;,&#39;1&#39;,&#39;c&#39;,&#39;q1&#39;,&#39;1&#39;,&#39;2&#39;,&#39;3&#39;),(&#39;ABC&#39;,&#39;1&#39;,&#39;c&#39;,&#39;q2&#39;,&#39;4&#39;,&#39;5&#39;,&#39;6&#39;)],[&#39;id&#39;,&#39;test_id&#39;,&#39;test_status&#39;,&#39;key&#39;,&#39;score1&#39;,&#39;score2&#39;,&#39;score3&#39;])
    df.show(10,False)
    df.groupBy(&quot;id&quot;,&quot;test_id&quot;,&quot;test_status&quot;).pivot(&quot;key&quot;).agg(first(col(&quot;score1&quot;)).alias(&quot;score1&quot;),first(col(&quot;score2&quot;)).alias(&quot;score2&quot;),first(col(&quot;score3&quot;)).alias(&quot;score3&quot;)).show(10,False)
    #input
    #+---+-------+-----------+---+------+------+------+
    #|id |test_id|test_status|key|score1|score2|score3|
    #+---+-------+-----------+---+------+------+------+
    #|ABC|1      |c          |q1 |1     |2     |3     |
    #|ABC|1      |c          |q2 |4     |5     |6     |
    #+---+-------+-----------+---+------+------+------+
    
    #+---+-------+-----------+---------+---------+---------+---------+---------+---------+
    #|id |test_id|test_status|q1_score1|q1_score2|q1_score3|q2_score1|q2_score2|q2_score3|
    #+---+-------+-----------+---------+---------+---------+---------+---------+---------+
    #|ABC|1      |c          |1        |2        |3        |4        |5        |6        |
    #+---+-------+-----------+---------+---------+---------+---------+---------+---------+
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pyspark 使用多列创建数据透视表

问题

答案1

答案2

Python 从 Confluence 读取 HTML 表格，并将每一行打印为列表。

Python签署EIP-712消息用于blur.io

我可以将“稀疏”轮廓转换为Python中的“密集”轮廓吗？

如何修复这个问题，Aiogram，Telegram

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。