根据分组函数创建新列

huangapple go评论92阅读模式
英文:

Create new column based on group-by function

问题

你可以尝试以下代码来实现你的需求:

import pandas as pd

df1 = pd.DataFrame({'Name': ['Bob', 'Bob', 'Bob', 'Joe', 'Joe', 'Joe', 'Alan', 'Alan', 'Steve', 'Steve'],
                    'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                    'Value': ['Y', 'Y', 'Y', 'N', 'N', 'N', 'Y', 'N', 'N', 'Y']})

# 定义一个函数来计算Result列的值
def compute_result(group):
    if group['Value'].all() == 'Y':
        group['Result'] = 'Y'
    else:
        group['Result'] = 'N'
    return group

# 使用groupby和apply来应用函数
df1 = df1.groupby('Name').apply(compute_result).reset_index(drop=True)

# 打印结果
print(df1)

这个代码会根据你的规则计算出Result列的值,并生成所需的输出。

英文:

I have a dataframe:

df1 = pd.DataFrame({'Name': ['Bob', 'Bob', 'Bob', 'Joe', 'Joe', 'Joe', 'Alan', 'Alan', 'Steve', 'Steve'],
                'ID': [1,2,3,4,5,6,7,8,9,10],
                'Value': ['Y','Y','Y','N','N','N','Y','N','N','Y']})


Name    ID    Value   
 Bob     1       Y          
 Bob     2       Y          
 Bob     3       Y          
 Joe     4       N          
 Joe     5       N          
 Joe     6       N
 Alan    7       Y
 Alan    8       N
 Steve   9       N
 Steve   10      Y

I need to compute a new Result column that has the following rule. For each group Name so Bob, Joe, etc., if each Value is 'Y', assign each value a Y in the new column. Otherwise, assign it a 'N'.

So ideal output is:

 Name    ID    Value   Result
 Bob     1       Y       Y
 Bob     2       Y       Y  
 Bob     3       Y       Y  
 Joe     4       N       N  
 Joe     5       N       N  
 Joe     6       N       N
 Alan    7       Y       N
 Alan    8       N       N
 Steve   9       N       N
 Steve   10      Y       N

This is what I have so far but doesn't work correctly.

df1['Result'] = df1.groupby('Name').Value.all().reindex(df1.Name).astype(str).values
df1

答案1

得分: 2

使用 numpy.whereGroupBy.transform 处理与原始大小相同的 Series,以及 GroupBy.all

df1['Result'] = np.where(df1['Value'].eq('Y').groupby(df1['Name']).transform('all'), 'Y', 'N')

替代方法:

mask = df1['Value'].eq('Y').groupby(df1['Name']).transform('all')
df1.loc[~mask, 'Value'] = 'N'

或者获取至少具有 N 个值的所有组,并使用 Series.isin 根据 mask 替换为 N

mask = df1['Name'].isin(df1.loc[df1['Value'].eq('N'), 'Name'])
df1.loc[mask, 'Value'] = 'N'

print(df1)
    Name  ID Value
0    Bob   1     Y
1    Bob   2     Y
2    Bob   3     Y
3    Joe   4     N
4    Joe   5     N
5    Joe   6     N
6   Alan   7     N
7   Alan   8     N
8  Steve   9     N
9  Steve  10     N
英文:

Use numpy.where with GroupBy.transform for Series with same size like original and GroupBy.all:

df1['Result'] = np.where(df1['Value'].eq('Y').groupby(df1['Name']).transform('all'), 'Y', 'N')

Alternative:

mask = df1['Value'].eq('Y').groupby(df1['Name']).transform('all')
df1.loc[~mask, 'Value'] = 'N'

Or get all groups with at least N and replace by N by mask with Series.isin:

mask = df1['Name'].isin(df1.loc[df1['Value'].eq('N'), 'Name'])
df1.loc[mask, 'Value'] = 'N'

print (df1)
    Name  ID Value
0    Bob   1     Y
1    Bob   2     Y
2    Bob   3     Y
3    Joe   4     N
4    Joe   5     N
5    Joe   6     N
6   Alan   7     N
7   Alan   8     N
8  Steve   9     N
9  Steve  10     N

答案2

得分: 1

你快要成功了!以下是您可以这样做的方法:

df1["Result"] = df1.groupby("Name").Value.transform(lambda value: all(value == "Y"))
英文:

You were close! Here's how you could do it:

df1["Result"] = df1.groupby("Name").Value.transform(lambda value: all(value == "Y"))

huangapple
  • 本文由 发表于 2020年1月6日 21:10:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/59612743.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定