How to combine the values of Pandas dataframe rows by a shared attribute in another column? And produce a new column with this concatenated?

huangapple go评论93阅读模式
英文:

How to combine the values of Pandas dataframe rows by a shared attribute in another column? And produce a new column with this concatenated?

问题

我明白你的需求。你可以使用以下代码来实现你想要的输出格式:

  1. import numpy as np
  2. import pandas as pd
  3. df = pd.read_csv('Colleague Award 2023(1-296).csv')
  4. # 删除不需要的列
  5. df2 = df.drop(['ID', 'Start time', 'Completion time', 'Email'], axis=1)
  6. # 按照被提名者分组,并将评论和评论者合并成一个字符串
  7. df3 = df2.groupby(['Who'])['Name', 'Why'].apply(lambda x: x.astype(str).agg(' : '.join), axis=1).reset_index()
  8. # 重命名列
  9. df3.columns = ['Who', 'Why']
  10. # 保存到新的CSV文件
  11. df3.to_csv('Colleague Award Collated.csv', index=False)

这段代码将按照被提名者分组,然后将评论和评论者合并成一个字符串,以制作你需要的输出格式。希望这对你有所帮助!

英文:

I am compiling a 'colleague award' list at my place of work. Members of staff can nominate as many colleagues as they like. What I would like is to produce a 'mail merge' style output where each staff member gets a list of their comments and who left them.

I have a .csv file which consists of 'Name' (of the person nominating), 'Who' (the person being nominated) and 'Why' (the comment).

How can I use PANDAS to combine the rows of the df by the common value, which is who the comment is about, and concatenate these together with who said it, into a new column in the DF with this info.

Code so far:

  1. import numpy as np
  2. import pandas as pd
  3. df = pd.read_csv('Colleague Award 2023(1-296).csv')
  4. df2 = df.drop(['ID', 'Start time' , 'Completion time', 'Email'], axis = 1)
  5. df3 = df2.sort_values('Who', ascending=True)
  6. df4 = df3.groupby(['Name'], as_index=False).agg({'Why' : ' '.join})
  7. df4.to_csv('Colleage Award Collated.csv')

My output is just the 'why' strings stuck together and the 'who' column has been omitted.

EDIT/Update:
What I am really after is to write to a csv where the columns are like this:

Who Why
Joe Bloggs Jane Doe: Joe is super helpful
Gary Public: Joe always makes good coffee
Jane Doe Joe Bloggs: Jane is a friendly face in the morning
Gary Public: Jane makes my day better!
Jimmy Person Jane Doe: Jimmy helps us out with our IT issues!

But where all comments are in the same cell, not seperated

This is so I can then easily do a mail-merge and create many word documents, one for each recipient with all of their comments from various colleagues.

Thank you again to those who have already contributed!

答案1

得分: 1

以下是您要翻译的代码部分的翻译:

  1. Assuming this input format:
  2. 假设输入格式如下
  3. You can use a loop with [`groupby`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html):
  4. 您可以使用循环与 [`groupby`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html)
  5. Output:
  6. 输出
  7. #### output as DataFrame
  8. #### 作为 DataFrame 输出
  9. To create a new DataFrame:
  10. 要创建一个新的 DataFrame
  11. To modify the input DataFrame in place:
  12. 要直接修改输入的 DataFrame
  13. Output:
  14. 输出
英文:

Assuming this input format:

  1. Name Who Why
  2. 0 Joe Bloggs Jane Doe Joe is super helpful
  3. 1 Joe Bloggs Gary Public Joe always brings my good coffee
  4. 2 Gary Public Jane Doe Gary always helps me with my PC
  5. 3 Gary Public Joe Bloggs Thank you Gary for training me!

You can use a loop with groupby:

  1. sep = ''
  2. for name, g in df.groupby('Name', sort=False):
  3. if sep:
  4. print(sep)
  5. sep = 'NEW ROW'
  6. print(f'{name} your comments:')
  7. for who, why in zip(g['Who'], g['Why']):
  8. print(f'{who}: {why}')

Output:

  1. Joe Bloggs your comments:
  2. Jane Doe: Joe is super helpful
  3. Gary Public: Joe always brings my good coffee
  4. NEW ROW
  5. Gary Public your comments:
  6. Jane Doe: Gary always helps me with my PC
  7. Joe Bloggs: Thank you Gary for training me!

output as DataFrame

To create a new DataFrame:

  1. out = df.assign(
  2. Why=lambda d: d['Who']+': '+d['Why'],
  3. Name=lambda d: d['Name'].mask(d['Name'].duplicated(), '')
  4. )[['Name', 'Why']]

To modify the input DataFrame in place:

  1. df['Why'] = df.pop('Who')+': '+df['Why']
  2. df.loc[df['Name'].duplicated(), 'Name'] = ''

Output:

  1. Name Why
  2. 0 Joe Bloggs Jane Doe: Joe is super helpful
  3. 1 Gary Public: Joe always brings my good coffee
  4. 2 Gary Public Jane Doe: Gary always helps me with my PC
  5. 3 Joe Bloggs: Thank you Gary for training me!

答案2

得分: 0

要实现所需的输出,您可以按如下方式修改您的代码:

  1. import pandas as pd
  2. df = pd.read_csv('Colleague Award 2023(1-296).csv')
  3. df2 = df.drop(['ID', 'Start time', 'Completion time', 'Email'], axis=1)
  4. df3 = df2.sort_values('Who', ascending=True)
  5. df4 = df3.groupby(['Who'], as_index=False).agg({'Name': ' '.join, 'Why': lambda x: '\n'.join(x)})
  6. df4['Comments'] = df4['Name'] + ': ' + df4['Why']
  7. df4 = df4[['Who', 'Comments']]
  8. df4.to_csv('Colleague Award Collated.csv', index=False)

更新后的代码生成的输出文件(Colleague Award Collated.csv)将具有两列:'Who' 和 'Comments'。每一行代表一个被提名的人,对应的评论将在 'Comments' 列中连接起来,包括提名人的姓名。

英文:

To achieve the desired output, you can modify your code as follows:

  1. import pandas as pd
  2. df = pd.read_csv('Colleague Award 2023(1-296).csv')
  3. df2 = df.drop(['ID', 'Start time', 'Completion time', 'Email'], axis=1)
  4. df3 = df2.sort_values('Who', ascending=True)
  5. df4 = df3.groupby(['Who'], as_index=False).agg({'Name': ' '.join, 'Why': lambda x: '\n'.join(x)})
  6. df4['Comments'] = df4['Name'] + ': ' + df4['Why']
  7. df4 = df4[['Who', 'Comments']]
  8. df4.to_csv('Colleague Award Collated.csv', index=False)

The output file (Colleague Award Collated.csv) generated by the updated code will have two columns: 'Who' and 'Comments'. Each row represents a person being nominated, and the corresponding comments will be concatenated in the 'Comments' column, including the names of the nominators.

huangapple
  • 本文由 发表于 2023年7月6日 15:17:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76626364.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定