2023年5月21日 16:37:50go评论96阅读模式

英文:

Reduce pandas data frame to have one column with list of repeating values

问题

I understand your request. Here's the translated code portion:

# 定义一个自定义聚合函数以将链接组合为列表
def combine_links(links):
    return list(set(links))  # 将链接转换为列表并去除重复项
# 按'name'分组GeoDataFrame，并聚合'path'列
result = df.groupby(['name'])['path'].agg(combine_links).reset_index()

Please note that this code is used to group your DataFrame by the 'name' column and aggregate the 'path' column by combining links into a list without duplicates, which should result in the desired output.

英文:

I have the following dataframe:

index   name     path
0       Dina     &quot;gs://my_bucket/folder1/img1.png&quot;
1       Dina     &quot;gs://my_bucket/folder1/img2.png&quot;
2       Lane     &quot;gs://my_bucket/folder1/img3.png&quot;
3       Bari     &quot;gs://my_bucket/folder1/img4.png&quot;
4       Andrew   &quot;gs://my_bucket/folder1/img5.png&quot;
5       Andrew   &quot;gs://my_bucket/folder1/img6.png&quot;
6       Andrew   &quot;gs://my_bucket/folder1/img7.png&quot;
7       Beti     &quot;gs://my_bucket/folder1/img7.png&quot;
8       Ladin    &quot;gs://my_bucket/folder1/img5.png&quot;
...

I would like to get new dataframe which will have the unique names appears only once, and the path column will be list with the matching paths. The output should look like this:

index   name     path
0       Dina     [&quot;gs://my_bucket/folder1/img1.png&quot;,&quot;gs://my_bucket/folder1/img2.png&quot;]
1       Lane     [&quot;gs://my_bucket/folder1/img3.png&quot;]
2       Bari     [&quot;gs://my_bucket/folder1/img4.png&quot;]
3       Andrew   [&quot;gs://my_bucket/folder1/img5.png&quot;,&quot;gs://my_bucket/folder1/img6.png&quot;,&quot;gs://my_bucket/folder1/img7.png&quot;]
4       Beti     [&quot;gs://my_bucket/folder1/img7.png&quot;]
5       Ladin    [&quot;gs://my_bucket/folder1/img5.png&quot;]
...

The result should have number of rows equal to unique names in the dataframe.
At the moment I'm using something I did with chatgpt, but it used function that I don't understand why is it used and also it duplicates the names of the rows, so if I know I suppose to have 842 unique names, I get 992 ...

This is chatGPT solution:

# Define a custom aggregation function to combine links as a list
def combine_links(links):
    return list(set(links))  # Convert links to a list and remove duplicates
# Group the GeoDataFrame by &#39;name&#39; and &#39;dili&#39; and aggregate the &#39;link&#39; column
result = df.groupby([&#39;name&#39;))[&#39;path&#39;].agg(combine_links).reset_index()

My goal is to find a solution the gives me in the end the right number of rows, which is number of unique names.

答案1

得分: 1

I think the answer is hidden here: https://stackoverflow.com/questions/21828398/what-is-the-difference-between-pandas-agg-and-apply-function

My code works:

import pandas as pd
def combine_links(links):
    return list(set(links))
df = pd.DataFrame({'Animal': ['Falcon', 'Falcon','Parrot', 'Parrot', 'Falcon'],
                   'Max Speed': [380., 370., 24., 26., 370.]})
df_new=df.groupby(['Animal'])['Max Speed'].apply(combine_links).reset_index()
print(df_new)

Here's the translated code snippet.

英文:

I think the answer is hidden here: https://stackoverflow.com/questions/21828398/what-is-the-difference-between-pandas-agg-and-apply-function

My code works:

import pandas as pd
def combine_links(links):
    return list(set(links))
df = pd.DataFrame({&#39;Animal&#39;: [&#39;Falcon&#39;, &#39;Falcon&#39;,&#39;Parrot&#39;, &#39;Parrot&#39;, &#39;Falcon&#39;],
                       &#39;Max Speed&#39;: [380., 370., 24., 26., 370.]})
df_new=df.groupby([&#39;Animal&#39;])[&#39;Max Speed&#39;].apply(combine_links).reset_index()
print(df_new)

答案2

得分: 1

一个可能的解决方案：

df.groupby('name')['path'].agg(list).reset_index()

输出：

         name                                               path
    0  Andrew  [gs://my_bucket/folder1/img5.png, gs://my_buck...
    1    Bari                  [gs://my_bucket/folder1/img4.png]
    2    Beti                  [gs://my_bucket/folder1/img7.png]
    3    Dina  [gs://my_bucket/folder1/img1.png, gs://my_buck...
    4   Ladin                  [gs://my_bucket/folder1/img5.png]
    5    Lane                  [gs://my_bucket/folder1/img3.png]

英文:

A possible solution:

df.groupby(&#39;name&#39;)[&#39;path&#39;].agg(list).reset_index()

Output:

     name                                               path
0  Andrew  [gs://my_bucket/folder1/img5.png, gs://my_buck...
1    Bari                  [gs://my_bucket/folder1/img4.png]
2    Beti                  [gs://my_bucket/folder1/img7.png]
3    Dina  [gs://my_bucket/folder1/img1.png, gs://my_buck...
4   Ladin                  [gs://my_bucket/folder1/img5.png]
5    Lane                  [gs://my_bucket/folder1/img3.png]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将pandas数据框减少为一个具有重复值列表的列。

问题

答案1

答案2

在pandas中为不同项目和不同的开始和结束日期插入连续日期的行。

有没有办法将原始数据和加密后的值并排放置？

Python – 链接两列

可视化两个球体的合并

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。