根据分组函数创建新列

huangapple go评论113阅读模式
英文:

Create new column based on group-by function

问题

你可以尝试以下代码来实现你的需求:

  1. import pandas as pd
  2. df1 = pd.DataFrame({'Name': ['Bob', 'Bob', 'Bob', 'Joe', 'Joe', 'Joe', 'Alan', 'Alan', 'Steve', 'Steve'],
  3. 'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
  4. 'Value': ['Y', 'Y', 'Y', 'N', 'N', 'N', 'Y', 'N', 'N', 'Y']})
  5. # 定义一个函数来计算Result列的值
  6. def compute_result(group):
  7. if group['Value'].all() == 'Y':
  8. group['Result'] = 'Y'
  9. else:
  10. group['Result'] = 'N'
  11. return group
  12. # 使用groupby和apply来应用函数
  13. df1 = df1.groupby('Name').apply(compute_result).reset_index(drop=True)
  14. # 打印结果
  15. print(df1)

这个代码会根据你的规则计算出Result列的值,并生成所需的输出。

英文:

I have a dataframe:

  1. df1 = pd.DataFrame({'Name': ['Bob', 'Bob', 'Bob', 'Joe', 'Joe', 'Joe', 'Alan', 'Alan', 'Steve', 'Steve'],
  2. 'ID': [1,2,3,4,5,6,7,8,9,10],
  3. 'Value': ['Y','Y','Y','N','N','N','Y','N','N','Y']})
  4. Name ID Value
  5. Bob 1 Y
  6. Bob 2 Y
  7. Bob 3 Y
  8. Joe 4 N
  9. Joe 5 N
  10. Joe 6 N
  11. Alan 7 Y
  12. Alan 8 N
  13. Steve 9 N
  14. Steve 10 Y

I need to compute a new Result column that has the following rule. For each group Name so Bob, Joe, etc., if each Value is 'Y', assign each value a Y in the new column. Otherwise, assign it a 'N'.

So ideal output is:

  1. Name ID Value Result
  2. Bob 1 Y Y
  3. Bob 2 Y Y
  4. Bob 3 Y Y
  5. Joe 4 N N
  6. Joe 5 N N
  7. Joe 6 N N
  8. Alan 7 Y N
  9. Alan 8 N N
  10. Steve 9 N N
  11. Steve 10 Y N

This is what I have so far but doesn't work correctly.

  1. df1['Result'] = df1.groupby('Name').Value.all().reindex(df1.Name).astype(str).values
  2. df1

答案1

得分: 2

使用 numpy.whereGroupBy.transform 处理与原始大小相同的 Series,以及 GroupBy.all

  1. df1['Result'] = np.where(df1['Value'].eq('Y').groupby(df1['Name']).transform('all'), 'Y', 'N')

替代方法:

  1. mask = df1['Value'].eq('Y').groupby(df1['Name']).transform('all')
  2. df1.loc[~mask, 'Value'] = 'N'

或者获取至少具有 N 个值的所有组,并使用 Series.isin 根据 mask 替换为 N

  1. mask = df1['Name'].isin(df1.loc[df1['Value'].eq('N'), 'Name'])
  2. df1.loc[mask, 'Value'] = 'N'

  1. print(df1)
  2. Name ID Value
  3. 0 Bob 1 Y
  4. 1 Bob 2 Y
  5. 2 Bob 3 Y
  6. 3 Joe 4 N
  7. 4 Joe 5 N
  8. 5 Joe 6 N
  9. 6 Alan 7 N
  10. 7 Alan 8 N
  11. 8 Steve 9 N
  12. 9 Steve 10 N
英文:

Use numpy.where with GroupBy.transform for Series with same size like original and GroupBy.all:

  1. df1['Result'] = np.where(df1['Value'].eq('Y').groupby(df1['Name']).transform('all'), 'Y', 'N')

Alternative:

  1. mask = df1['Value'].eq('Y').groupby(df1['Name']).transform('all')
  2. df1.loc[~mask, 'Value'] = 'N'

Or get all groups with at least N and replace by N by mask with Series.isin:

  1. mask = df1['Name'].isin(df1.loc[df1['Value'].eq('N'), 'Name'])
  2. df1.loc[mask, 'Value'] = 'N'

  1. print (df1)
  2. Name ID Value
  3. 0 Bob 1 Y
  4. 1 Bob 2 Y
  5. 2 Bob 3 Y
  6. 3 Joe 4 N
  7. 4 Joe 5 N
  8. 5 Joe 6 N
  9. 6 Alan 7 N
  10. 7 Alan 8 N
  11. 8 Steve 9 N
  12. 9 Steve 10 N

答案2

得分: 1

你快要成功了!以下是您可以这样做的方法:

  1. df1["Result"] = df1.groupby("Name").Value.transform(lambda value: all(value == "Y"))
英文:

You were close! Here's how you could do it:

  1. df1["Result"] = df1.groupby("Name").Value.transform(lambda value: all(value == "Y"))

huangapple
  • 本文由 发表于 2020年1月6日 21:10:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/59612743.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定