英文:
How to Convert Column into a List based on the other column in pyspark
问题
我在pyspark中有一个数据框,如下所示:
| 列 A | 列 B |
| ------ | ------ |
| 123 | abc |
| 123 | def |
| 456 | klm |
| 789 | nop |
| 789 | qrst |
对于列 A 中的每一行,列 B 必须转换为一个列表。结果应该如下所示。
| 列 A | 列 B |
| ------ | ----------- |
| 123 | [abc,def] |
| 456 | [klm] |
| 789 | [nop,qrst] |
我尝试过使用 map()
,但它没有给我期望的结果。你可以指导我如何解决这个问题吗?
英文:
I have a data frame in pyspark which is as follows:
| Column A | Column B |
| -------- | -------- |
| 123 | abc |
| 123 | def |
| 456 | klm |
| 789 | nop |
| 789 | qrst |
For every row in column A the column B has to be transformed into a list. The result should look like this.
| Column A | Column B |
| -------- | -------- |
| 123 |[abc,def] |
| 456 | [klm] |
| 789 |[nop,qrst]|
I have tried using map(), but it didn't give me the expected results. Can you point me in the right direction on how to approach this problem ?
答案1
得分: 0
Use collect_list,
from pyspark.sql import functions as F
df1.groupBy("Column A").agg(F.collect_list("Column B")).show()
Input:
Output:
英文:
Use collect_list,
from pyspark.sql import functions as F
df1.groupBy("Column A").agg(F.collect_list("Column B")).show()
Input:
Output:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论