英文:
How to combine the values of Pandas dataframe rows by a shared attribute in another column? And produce a new column with this concatenated?
问题
我明白你的需求。你可以使用以下代码来实现你想要的输出格式:
import numpy as np
import pandas as pd
df = pd.read_csv('Colleague Award 2023(1-296).csv')
# 删除不需要的列
df2 = df.drop(['ID', 'Start time', 'Completion time', 'Email'], axis=1)
# 按照被提名者分组,并将评论和评论者合并成一个字符串
df3 = df2.groupby(['Who'])['Name', 'Why'].apply(lambda x: x.astype(str).agg(' : '.join), axis=1).reset_index()
# 重命名列
df3.columns = ['Who', 'Why']
# 保存到新的CSV文件
df3.to_csv('Colleague Award Collated.csv', index=False)
这段代码将按照被提名者分组,然后将评论和评论者合并成一个字符串,以制作你需要的输出格式。希望这对你有所帮助!
英文:
I am compiling a 'colleague award' list at my place of work. Members of staff can nominate as many colleagues as they like. What I would like is to produce a 'mail merge' style output where each staff member gets a list of their comments and who left them.
I have a .csv file which consists of 'Name' (of the person nominating), 'Who' (the person being nominated) and 'Why' (the comment).
How can I use PANDAS to combine the rows of the df by the common value, which is who the comment is about, and concatenate these together with who said it, into a new column in the DF with this info.
Code so far:
import numpy as np
import pandas as pd
df = pd.read_csv('Colleague Award 2023(1-296).csv')
df2 = df.drop(['ID', 'Start time' , 'Completion time', 'Email'], axis = 1)
df3 = df2.sort_values('Who', ascending=True)
df4 = df3.groupby(['Name'], as_index=False).agg({'Why' : ' '.join})
df4.to_csv('Colleage Award Collated.csv')
My output is just the 'why' strings stuck together and the 'who' column has been omitted.
EDIT/Update:
What I am really after is to write to a csv where the columns are like this:
Who | Why |
---|---|
Joe Bloggs | Jane Doe: Joe is super helpful |
Gary Public: Joe always makes good coffee | |
Jane Doe | Joe Bloggs: Jane is a friendly face in the morning |
Gary Public: Jane makes my day better! | |
Jimmy Person | Jane Doe: Jimmy helps us out with our IT issues! |
But where all comments are in the same cell, not seperated
This is so I can then easily do a mail-merge and create many word documents, one for each recipient with all of their comments from various colleagues.
Thank you again to those who have already contributed!
答案1
得分: 1
以下是您要翻译的代码部分的翻译:
Assuming this input format:
假设输入格式如下:
You can use a loop with [`groupby`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html):
您可以使用循环与 [`groupby`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html):
Output:
输出:
#### output as DataFrame
#### 作为 DataFrame 输出
To create a new DataFrame:
要创建一个新的 DataFrame:
To modify the input DataFrame in place:
要直接修改输入的 DataFrame:
Output:
输出:
英文:
Assuming this input format:
Name Who Why
0 Joe Bloggs Jane Doe Joe is super helpful
1 Joe Bloggs Gary Public Joe always brings my good coffee
2 Gary Public Jane Doe Gary always helps me with my PC
3 Gary Public Joe Bloggs Thank you Gary for training me!
You can use a loop with groupby
:
sep = ''
for name, g in df.groupby('Name', sort=False):
if sep:
print(sep)
sep = 'NEW ROW'
print(f'{name} your comments:')
for who, why in zip(g['Who'], g['Why']):
print(f'{who}: {why}')
Output:
Joe Bloggs your comments:
Jane Doe: Joe is super helpful
Gary Public: Joe always brings my good coffee
NEW ROW
Gary Public your comments:
Jane Doe: Gary always helps me with my PC
Joe Bloggs: Thank you Gary for training me!
output as DataFrame
To create a new DataFrame:
out = df.assign(
Why=lambda d: d['Who']+': '+d['Why'],
Name=lambda d: d['Name'].mask(d['Name'].duplicated(), '')
)[['Name', 'Why']]
To modify the input DataFrame in place:
df['Why'] = df.pop('Who')+': '+df['Why']
df.loc[df['Name'].duplicated(), 'Name'] = ''
Output:
Name Why
0 Joe Bloggs Jane Doe: Joe is super helpful
1 Gary Public: Joe always brings my good coffee
2 Gary Public Jane Doe: Gary always helps me with my PC
3 Joe Bloggs: Thank you Gary for training me!
答案2
得分: 0
要实现所需的输出,您可以按如下方式修改您的代码:
import pandas as pd
df = pd.read_csv('Colleague Award 2023(1-296).csv')
df2 = df.drop(['ID', 'Start time', 'Completion time', 'Email'], axis=1)
df3 = df2.sort_values('Who', ascending=True)
df4 = df3.groupby(['Who'], as_index=False).agg({'Name': ' '.join, 'Why': lambda x: '\n'.join(x)})
df4['Comments'] = df4['Name'] + ': ' + df4['Why']
df4 = df4[['Who', 'Comments']]
df4.to_csv('Colleague Award Collated.csv', index=False)
更新后的代码生成的输出文件(Colleague Award Collated.csv)将具有两列:'Who' 和 'Comments'。每一行代表一个被提名的人,对应的评论将在 'Comments' 列中连接起来,包括提名人的姓名。
英文:
To achieve the desired output, you can modify your code as follows:
import pandas as pd
df = pd.read_csv('Colleague Award 2023(1-296).csv')
df2 = df.drop(['ID', 'Start time', 'Completion time', 'Email'], axis=1)
df3 = df2.sort_values('Who', ascending=True)
df4 = df3.groupby(['Who'], as_index=False).agg({'Name': ' '.join, 'Why': lambda x: '\n'.join(x)})
df4['Comments'] = df4['Name'] + ': ' + df4['Why']
df4 = df4[['Who', 'Comments']]
df4.to_csv('Colleague Award Collated.csv', index=False)
The output file (Colleague Award Collated.csv) generated by the updated code will have two columns: 'Who' and 'Comments'. Each row represents a person being nominated, and the corresponding comments will be concatenated in the 'Comments' column, including the names of the nominators.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论