How to combine the values of Pandas dataframe rows by a shared attribute in another column? And produce a new column with this concatenated?

huangapple go评论71阅读模式
英文:

How to combine the values of Pandas dataframe rows by a shared attribute in another column? And produce a new column with this concatenated?

问题

我明白你的需求。你可以使用以下代码来实现你想要的输出格式:

import numpy as np
import pandas as pd

df = pd.read_csv('Colleague Award 2023(1-296).csv')

# 删除不需要的列
df2 = df.drop(['ID', 'Start time', 'Completion time', 'Email'], axis=1)

# 按照被提名者分组,并将评论和评论者合并成一个字符串
df3 = df2.groupby(['Who'])['Name', 'Why'].apply(lambda x: x.astype(str).agg(' : '.join), axis=1).reset_index()

# 重命名列
df3.columns = ['Who', 'Why']

# 保存到新的CSV文件
df3.to_csv('Colleague Award Collated.csv', index=False)

这段代码将按照被提名者分组,然后将评论和评论者合并成一个字符串,以制作你需要的输出格式。希望这对你有所帮助!

英文:

I am compiling a 'colleague award' list at my place of work. Members of staff can nominate as many colleagues as they like. What I would like is to produce a 'mail merge' style output where each staff member gets a list of their comments and who left them.

I have a .csv file which consists of 'Name' (of the person nominating), 'Who' (the person being nominated) and 'Why' (the comment).

How can I use PANDAS to combine the rows of the df by the common value, which is who the comment is about, and concatenate these together with who said it, into a new column in the DF with this info.

Code so far:

import numpy as np
import pandas as pd

df = pd.read_csv('Colleague Award 2023(1-296).csv')

df2 = df.drop(['ID', 'Start time' , 'Completion time', 'Email'], axis = 1)
df3 = df2.sort_values('Who', ascending=True)

df4 = df3.groupby(['Name'], as_index=False).agg({'Why' : ' '.join})

df4.to_csv('Colleage Award Collated.csv')

My output is just the 'why' strings stuck together and the 'who' column has been omitted.

EDIT/Update:
What I am really after is to write to a csv where the columns are like this:

Who Why
Joe Bloggs Jane Doe: Joe is super helpful
Gary Public: Joe always makes good coffee
Jane Doe Joe Bloggs: Jane is a friendly face in the morning
Gary Public: Jane makes my day better!
Jimmy Person Jane Doe: Jimmy helps us out with our IT issues!

But where all comments are in the same cell, not seperated

This is so I can then easily do a mail-merge and create many word documents, one for each recipient with all of their comments from various colleagues.

Thank you again to those who have already contributed!

答案1

得分: 1

以下是您要翻译的代码部分的翻译:

Assuming this input format:
假设输入格式如下

You can use a loop with [`groupby`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html):
您可以使用循环与 [`groupby`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html)

Output:
输出

#### output as DataFrame
#### 作为 DataFrame 输出

To create a new DataFrame:
要创建一个新的 DataFrame

To modify the input DataFrame in place:
要直接修改输入的 DataFrame

Output:
输出

英文:

Assuming this input format:

          Name          Who                               Why
0   Joe Bloggs     Jane Doe              Joe is super helpful
1   Joe Bloggs  Gary Public  Joe always brings my good coffee
2  Gary Public     Jane Doe   Gary always helps me with my PC
3  Gary Public   Joe Bloggs   Thank you Gary for training me!

You can use a loop with groupby:

sep = ''
for name, g in df.groupby('Name', sort=False):
    if sep:
        print(sep)
    sep = 'NEW ROW'
    print(f'{name} your comments:')
    for who, why in zip(g['Who'], g['Why']):
        print(f'{who}: {why}')

Output:

Joe Bloggs your comments:
Jane Doe: Joe is super helpful
Gary Public: Joe always brings my good coffee
NEW ROW
Gary Public your comments:
Jane Doe: Gary always helps me with my PC
Joe Bloggs: Thank you Gary for training me!

output as DataFrame

To create a new DataFrame:

out = df.assign(
          Why=lambda d: d['Who']+': '+d['Why'],
          Name=lambda d: d['Name'].mask(d['Name'].duplicated(), '')
         )[['Name', 'Why']]

To modify the input DataFrame in place:

df['Why'] = df.pop('Who')+': '+df['Why']
df.loc[df['Name'].duplicated(), 'Name'] = ''

Output:

          Name                                            Why
0   Joe Bloggs                 Jane Doe: Joe is super helpful
1               Gary Public: Joe always brings my good coffee
2  Gary Public      Jane Doe: Gary always helps me with my PC
3                 Joe Bloggs: Thank you Gary for training me!

答案2

得分: 0

要实现所需的输出,您可以按如下方式修改您的代码:

import pandas as pd

df = pd.read_csv('Colleague Award 2023(1-296).csv')

df2 = df.drop(['ID', 'Start time', 'Completion time', 'Email'], axis=1)
df3 = df2.sort_values('Who', ascending=True)

df4 = df3.groupby(['Who'], as_index=False).agg({'Name': ' '.join, 'Why': lambda x: '\n'.join(x)})
df4['Comments'] = df4['Name'] + ': ' + df4['Why']
df4 = df4[['Who', 'Comments']]

df4.to_csv('Colleague Award Collated.csv', index=False)

更新后的代码生成的输出文件(Colleague Award Collated.csv)将具有两列:'Who' 和 'Comments'。每一行代表一个被提名的人,对应的评论将在 'Comments' 列中连接起来,包括提名人的姓名。

英文:

To achieve the desired output, you can modify your code as follows:

import pandas as pd

df = pd.read_csv('Colleague Award 2023(1-296).csv')

df2 = df.drop(['ID', 'Start time', 'Completion time', 'Email'], axis=1)
df3 = df2.sort_values('Who', ascending=True)

df4 = df3.groupby(['Who'], as_index=False).agg({'Name': ' '.join, 'Why': lambda x: '\n'.join(x)})
df4['Comments'] = df4['Name'] + ': ' + df4['Why']
df4 = df4[['Who', 'Comments']]

df4.to_csv('Colleague Award Collated.csv', index=False)

The output file (Colleague Award Collated.csv) generated by the updated code will have two columns: 'Who' and 'Comments'. Each row represents a person being nominated, and the corresponding comments will be concatenated in the 'Comments' column, including the names of the nominators.

huangapple
  • 本文由 发表于 2023年7月6日 15:17:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76626364.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定