如何在pyspark中根据另一列将列转换为列表

huangapple go评论57阅读模式
英文:

How to Convert Column into a List based on the other column in pyspark

问题

我在pyspark中有一个数据框,如下所示:

| 列 A   | 列 B   |
| ------ | ------ |
| 123    | abc    |
| 123    | def    |
| 456    | klm    |
| 789    | nop    |
| 789    | qrst   |

对于列 A 中的每一行,列 B 必须转换为一个列表。结果应该如下所示。

| 列 A   | 列 B        |
| ------ | ----------- |
| 123    | [abc,def]   |
| 456    | [klm]      |
| 789    | [nop,qrst]  |

我尝试过使用 map(),但它没有给我期望的结果。你可以指导我如何解决这个问题吗?

英文:

I have a data frame in pyspark which is as follows:

| Column A | Column B |
| -------- | -------- |
| 123      | abc   |
| 123      | def   |
| 456      | klm   |
| 789      | nop   |
| 789      | qrst  | 

For every row in column A the column B has to be transformed into a list. The result should look like this.

| Column A | Column B |
| -------- | -------- |
| 123      |[abc,def] |
| 456      | [klm]    |
| 789      |[nop,qrst]|

I have tried using map(), but it didn't give me the expected results. Can you point me in the right direction on how to approach this problem ?

答案1

得分: 0

Use collect_list

from pyspark.sql import functions as F
df1.groupBy("Column A").agg(F.collect_list("Column B")).show()

Input:

如何在pyspark中根据另一列将列转换为列表

Output:

如何在pyspark中根据另一列将列转换为列表

英文:

Use collect_list,

from pyspark.sql import functions as F
df1.groupBy("Column A").agg(F.collect_list("Column B")).show()

Input:

如何在pyspark中根据另一列将列转换为列表

Output:

如何在pyspark中根据另一列将列转换为列表

huangapple
  • 本文由 发表于 2023年2月6日 20:56:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/75361594.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定