将pandas数据框减少为一个具有重复值列表的列。

huangapple go评论70阅读模式
英文:

Reduce pandas data frame to have one column with list of repeating values

问题

I understand your request. Here's the translated code portion:

# 定义一个自定义聚合函数以将链接组合为列表
def combine_links(links):
    return list(set(links))  # 将链接转换为列表并去除重复项

# 按'name'分组GeoDataFrame,并聚合'path'列
result = df.groupby(['name'])['path'].agg(combine_links).reset_index()

Please note that this code is used to group your DataFrame by the 'name' column and aggregate the 'path' column by combining links into a list without duplicates, which should result in the desired output.

英文:

I have the following dataframe:

index   name     path
0       Dina     "gs://my_bucket/folder1/img1.png"
1       Dina     "gs://my_bucket/folder1/img2.png"
2       Lane     "gs://my_bucket/folder1/img3.png"
3       Bari     "gs://my_bucket/folder1/img4.png"
4       Andrew   "gs://my_bucket/folder1/img5.png"
5       Andrew   "gs://my_bucket/folder1/img6.png"
6       Andrew   "gs://my_bucket/folder1/img7.png"
7       Beti     "gs://my_bucket/folder1/img7.png"
8       Ladin    "gs://my_bucket/folder1/img5.png"
...

I would like to get new dataframe which will have the unique names appears only once, and the path column will be list with the matching paths. The output should look like this:

index   name     path
0       Dina     ["gs://my_bucket/folder1/img1.png","gs://my_bucket/folder1/img2.png"]
1       Lane     ["gs://my_bucket/folder1/img3.png"]
2       Bari     ["gs://my_bucket/folder1/img4.png"]
3       Andrew   ["gs://my_bucket/folder1/img5.png","gs://my_bucket/folder1/img6.png","gs://my_bucket/folder1/img7.png"]
4       Beti     ["gs://my_bucket/folder1/img7.png"]
5       Ladin    ["gs://my_bucket/folder1/img5.png"]
...

The result should have number of rows equal to unique names in the dataframe.
At the moment I'm using something I did with chatgpt, but it used function that I don't understand why is it used and also it duplicates the names of the rows, so if I know I suppose to have 842 unique names, I get 992 ...

This is chatGPT solution:

# Define a custom aggregation function to combine links as a list
def combine_links(links):
    return list(set(links))  # Convert links to a list and remove duplicates

# Group the GeoDataFrame by 'name' and 'dili' and aggregate the 'link' column
result = df.groupby(['name'))['path'].agg(combine_links).reset_index()

My goal is to find a solution the gives me in the end the right number of rows, which is number of unique names.

答案1

得分: 1

I think the answer is hidden here: https://stackoverflow.com/questions/21828398/what-is-the-difference-between-pandas-agg-and-apply-function

My code works:

import pandas as pd

def combine_links(links):
    return list(set(links))

df = pd.DataFrame({'Animal': ['Falcon', 'Falcon','Parrot', 'Parrot', 'Falcon'],
                   'Max Speed': [380., 370., 24., 26., 370.]})

df_new=df.groupby(['Animal'])['Max Speed'].apply(combine_links).reset_index()

print(df_new)

Here's the translated code snippet.

英文:

I think the answer is hidden here: https://stackoverflow.com/questions/21828398/what-is-the-difference-between-pandas-agg-and-apply-function

My code works:

import pandas as pd


def combine_links(links):
    return list(set(links))

df = pd.DataFrame({'Animal': ['Falcon', 'Falcon','Parrot', 'Parrot', 'Falcon'],
                       'Max Speed': [380., 370., 24., 26., 370.]})

df_new=df.groupby(['Animal'])['Max Speed'].apply(combine_links).reset_index()

print(df_new)

答案2

得分: 1

一个可能的解决方案:

df.groupby('name')['path'].agg(list).reset_index()

输出:

         name                                               path
    0  Andrew  [gs://my_bucket/folder1/img5.png, gs://my_buck...
    1    Bari                  [gs://my_bucket/folder1/img4.png]
    2    Beti                  [gs://my_bucket/folder1/img7.png]
    3    Dina  [gs://my_bucket/folder1/img1.png, gs://my_buck...
    4   Ladin                  [gs://my_bucket/folder1/img5.png]
    5    Lane                  [gs://my_bucket/folder1/img3.png]
英文:

A possible solution:

df.groupby('name')['path'].agg(list).reset_index()

Output:

     name                                               path
0  Andrew  [gs://my_bucket/folder1/img5.png, gs://my_buck...
1    Bari                  [gs://my_bucket/folder1/img4.png]
2    Beti                  [gs://my_bucket/folder1/img7.png]
3    Dina  [gs://my_bucket/folder1/img1.png, gs://my_buck...
4   Ladin                  [gs://my_bucket/folder1/img5.png]
5    Lane                  [gs://my_bucket/folder1/img3.png]

huangapple
  • 本文由 发表于 2023年5月21日 16:37:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76298985.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定