2023年7月6日 15:17:41go评论93阅读模式

英文:

How to combine the values of Pandas dataframe rows by a shared attribute in another column? And produce a new column with this concatenated?

问题

我明白你的需求。你可以使用以下代码来实现你想要的输出格式：

import numpy as np
import pandas as pd
df = pd.read_csv('Colleague Award 2023(1-296).csv')
# 删除不需要的列
df2 = df.drop(['ID', 'Start time', 'Completion time', 'Email'], axis=1)
# 按照被提名者分组，并将评论和评论者合并成一个字符串
df3 = df2.groupby(['Who'])['Name', 'Why'].apply(lambda x: x.astype(str).agg(' : '.join), axis=1).reset_index()
# 重命名列
df3.columns = ['Who', 'Why']
# 保存到新的CSV文件
df3.to_csv('Colleague Award Collated.csv', index=False)

这段代码将按照被提名者分组，然后将评论和评论者合并成一个字符串，以制作你需要的输出格式。希望这对你有所帮助！

英文:

I am compiling a 'colleague award' list at my place of work. Members of staff can nominate as many colleagues as they like. What I would like is to produce a 'mail merge' style output where each staff member gets a list of their comments and who left them.

I have a .csv file which consists of 'Name' (of the person nominating), 'Who' (the person being nominated) and 'Why' (the comment).

How can I use PANDAS to combine the rows of the df by the common value, which is who the comment is about, and concatenate these together with who said it, into a new column in the DF with this info.

Code so far:

import numpy as np
import pandas as pd
df = pd.read_csv(&#39;Colleague Award 2023(1-296).csv&#39;)
df2 = df.drop([&#39;ID&#39;, &#39;Start time&#39; , &#39;Completion time&#39;, &#39;Email&#39;], axis = 1)
df3 = df2.sort_values(&#39;Who&#39;, ascending=True)
df4 = df3.groupby([&#39;Name&#39;], as_index=False).agg({&#39;Why&#39; : &#39; &#39;.join})
df4.to_csv(&#39;Colleage Award Collated.csv&#39;)

My output is just the 'why' strings stuck together and the 'who' column has been omitted.

EDIT/Update:
What I am really after is to write to a csv where the columns are like this:

Who	Why
Joe Bloggs	Jane Doe: Joe is super helpful
	Gary Public: Joe always makes good coffee
Jane Doe	Joe Bloggs: Jane is a friendly face in the morning
	Gary Public: Jane makes my day better!
Jimmy Person	Jane Doe: Jimmy helps us out with our IT issues!

But where all comments are in the same cell, not seperated

This is so I can then easily do a mail-merge and create many word documents, one for each recipient with all of their comments from various colleagues.

Thank you again to those who have already contributed!

答案1

得分: 1

以下是您要翻译的代码部分的翻译：

Assuming this input format:
假设输入格式如下：
You can use a loop with [`groupby`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html):
您可以使用循环与 [`groupby`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html)：
Output:
输出：
#### output as DataFrame
#### 作为 DataFrame 输出
To create a new DataFrame:
要创建一个新的 DataFrame：
To modify the input DataFrame in place:
要直接修改输入的 DataFrame：
Output:
输出：

英文:

Assuming this input format:

          Name          Who                               Why
0   Joe Bloggs     Jane Doe              Joe is super helpful
1   Joe Bloggs  Gary Public  Joe always brings my good coffee
2  Gary Public     Jane Doe   Gary always helps me with my PC
3  Gary Public   Joe Bloggs   Thank you Gary for training me!

You can use a loop with groupby:

sep = &#39;&#39;
for name, g in df.groupby(&#39;Name&#39;, sort=False):
    if sep:
        print(sep)
    sep = &#39;NEW ROW&#39;
    print(f&#39;{name} your comments:&#39;)
    for who, why in zip(g[&#39;Who&#39;], g[&#39;Why&#39;]):
        print(f&#39;{who}: {why}&#39;)

Output:

Joe Bloggs your comments:
Jane Doe: Joe is super helpful
Gary Public: Joe always brings my good coffee
NEW ROW
Gary Public your comments:
Jane Doe: Gary always helps me with my PC
Joe Bloggs: Thank you Gary for training me!

output as DataFrame

To create a new DataFrame:

out = df.assign(
          Why=lambda d: d[&#39;Who&#39;]+&#39;: &#39;+d[&#39;Why&#39;],
          Name=lambda d: d[&#39;Name&#39;].mask(d[&#39;Name&#39;].duplicated(), &#39;&#39;)
         )[[&#39;Name&#39;, &#39;Why&#39;]]

To modify the input DataFrame in place:

df[&#39;Why&#39;] = df.pop(&#39;Who&#39;)+&#39;: &#39;+df[&#39;Why&#39;]
df.loc[df[&#39;Name&#39;].duplicated(), &#39;Name&#39;] = &#39;&#39;

Output:

          Name                                            Why
0   Joe Bloggs                 Jane Doe: Joe is super helpful
1               Gary Public: Joe always brings my good coffee
2  Gary Public      Jane Doe: Gary always helps me with my PC
3                 Joe Bloggs: Thank you Gary for training me!

答案2

得分: 0

要实现所需的输出，您可以按如下方式修改您的代码：

import pandas as pd
df = pd.read_csv('Colleague Award 2023(1-296).csv')
df2 = df.drop(['ID', 'Start time', 'Completion time', 'Email'], axis=1)
df3 = df2.sort_values('Who', ascending=True)
df4 = df3.groupby(['Who'], as_index=False).agg({'Name': ' '.join, 'Why': lambda x: '\n'.join(x)})
df4['Comments'] = df4['Name'] + ': ' + df4['Why']
df4 = df4[['Who', 'Comments']]
df4.to_csv('Colleague Award Collated.csv', index=False)

更新后的代码生成的输出文件（Colleague Award Collated.csv）将具有两列：'Who' 和 'Comments'。每一行代表一个被提名的人，对应的评论将在 'Comments' 列中连接起来，包括提名人的姓名。

英文:

To achieve the desired output, you can modify your code as follows:

import pandas as pd
df = pd.read_csv(&#39;Colleague Award 2023(1-296).csv&#39;)
df2 = df.drop([&#39;ID&#39;, &#39;Start time&#39;, &#39;Completion time&#39;, &#39;Email&#39;], axis=1)
df3 = df2.sort_values(&#39;Who&#39;, ascending=True)
df4 = df3.groupby([&#39;Who&#39;], as_index=False).agg({&#39;Name&#39;: &#39; &#39;.join, &#39;Why&#39;: lambda x: &#39;\n&#39;.join(x)})
df4[&#39;Comments&#39;] = df4[&#39;Name&#39;] + &#39;: &#39; + df4[&#39;Why&#39;]
df4 = df4[[&#39;Who&#39;, &#39;Comments&#39;]]
df4.to_csv(&#39;Colleague Award Collated.csv&#39;, index=False)

The output file (Colleague Award Collated.csv) generated by the updated code will have two columns: 'Who' and 'Comments'. Each row represents a person being nominated, and the corresponding comments will be concatenated in the 'Comments' column, including the names of the nominators.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How to combine the values of Pandas dataframe rows by a shared attribute in another column? And produce a new column with this concatenated?

问题

答案1

output as DataFrame

答案2

Highlight a row in a pandas df if that row also appears in another df

ZeroDivisionError: division by zero (osu learning)

win32api.GetAsyncKeyState为什么不是全局热键？

Python正则表达式提取较大字符串中的主题标签

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。