英文:
Reduce pandas data frame to have one column with list of repeating values
问题
I understand your request. Here's the translated code portion:
# 定义一个自定义聚合函数以将链接组合为列表
def combine_links(links):
return list(set(links)) # 将链接转换为列表并去除重复项
# 按'name'分组GeoDataFrame,并聚合'path'列
result = df.groupby(['name'])['path'].agg(combine_links).reset_index()
Please note that this code is used to group your DataFrame by the 'name' column and aggregate the 'path' column by combining links into a list without duplicates, which should result in the desired output.
英文:
I have the following dataframe:
index name path
0 Dina "gs://my_bucket/folder1/img1.png"
1 Dina "gs://my_bucket/folder1/img2.png"
2 Lane "gs://my_bucket/folder1/img3.png"
3 Bari "gs://my_bucket/folder1/img4.png"
4 Andrew "gs://my_bucket/folder1/img5.png"
5 Andrew "gs://my_bucket/folder1/img6.png"
6 Andrew "gs://my_bucket/folder1/img7.png"
7 Beti "gs://my_bucket/folder1/img7.png"
8 Ladin "gs://my_bucket/folder1/img5.png"
...
I would like to get new dataframe which will have the unique names appears only once, and the path column will be list with the matching paths. The output should look like this:
index name path
0 Dina ["gs://my_bucket/folder1/img1.png","gs://my_bucket/folder1/img2.png"]
1 Lane ["gs://my_bucket/folder1/img3.png"]
2 Bari ["gs://my_bucket/folder1/img4.png"]
3 Andrew ["gs://my_bucket/folder1/img5.png","gs://my_bucket/folder1/img6.png","gs://my_bucket/folder1/img7.png"]
4 Beti ["gs://my_bucket/folder1/img7.png"]
5 Ladin ["gs://my_bucket/folder1/img5.png"]
...
The result should have number of rows equal to unique names in the dataframe.
At the moment I'm using something I did with chatgpt, but it used function that I don't understand why is it used and also it duplicates the names of the rows, so if I know I suppose to have 842 unique names, I get 992 ...
This is chatGPT solution:
# Define a custom aggregation function to combine links as a list
def combine_links(links):
return list(set(links)) # Convert links to a list and remove duplicates
# Group the GeoDataFrame by 'name' and 'dili' and aggregate the 'link' column
result = df.groupby(['name'))['path'].agg(combine_links).reset_index()
My goal is to find a solution the gives me in the end the right number of rows, which is number of unique names.
答案1
得分: 1
I think the answer is hidden here: https://stackoverflow.com/questions/21828398/what-is-the-difference-between-pandas-agg-and-apply-function
My code works:
import pandas as pd
def combine_links(links):
return list(set(links))
df = pd.DataFrame({'Animal': ['Falcon', 'Falcon','Parrot', 'Parrot', 'Falcon'],
'Max Speed': [380., 370., 24., 26., 370.]})
df_new=df.groupby(['Animal'])['Max Speed'].apply(combine_links).reset_index()
print(df_new)
Here's the translated code snippet.
英文:
I think the answer is hidden here: https://stackoverflow.com/questions/21828398/what-is-the-difference-between-pandas-agg-and-apply-function
My code works:
import pandas as pd
def combine_links(links):
return list(set(links))
df = pd.DataFrame({'Animal': ['Falcon', 'Falcon','Parrot', 'Parrot', 'Falcon'],
'Max Speed': [380., 370., 24., 26., 370.]})
df_new=df.groupby(['Animal'])['Max Speed'].apply(combine_links).reset_index()
print(df_new)
答案2
得分: 1
一个可能的解决方案:
df.groupby('name')['path'].agg(list).reset_index()
输出:
name path
0 Andrew [gs://my_bucket/folder1/img5.png, gs://my_buck...
1 Bari [gs://my_bucket/folder1/img4.png]
2 Beti [gs://my_bucket/folder1/img7.png]
3 Dina [gs://my_bucket/folder1/img1.png, gs://my_buck...
4 Ladin [gs://my_bucket/folder1/img5.png]
5 Lane [gs://my_bucket/folder1/img3.png]
英文:
A possible solution:
df.groupby('name')['path'].agg(list).reset_index()
Output:
name path
0 Andrew [gs://my_bucket/folder1/img5.png, gs://my_buck...
1 Bari [gs://my_bucket/folder1/img4.png]
2 Beti [gs://my_bucket/folder1/img7.png]
3 Dina [gs://my_bucket/folder1/img1.png, gs://my_buck...
4 Ladin [gs://my_bucket/folder1/img5.png]
5 Lane [gs://my_bucket/folder1/img3.png]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论