英文:
Create new column based on group-by function
问题
你可以尝试以下代码来实现你的需求:
import pandas as pd
df1 = pd.DataFrame({'Name': ['Bob', 'Bob', 'Bob', 'Joe', 'Joe', 'Joe', 'Alan', 'Alan', 'Steve', 'Steve'],
'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Value': ['Y', 'Y', 'Y', 'N', 'N', 'N', 'Y', 'N', 'N', 'Y']})
# 定义一个函数来计算Result列的值
def compute_result(group):
if group['Value'].all() == 'Y':
group['Result'] = 'Y'
else:
group['Result'] = 'N'
return group
# 使用groupby和apply来应用函数
df1 = df1.groupby('Name').apply(compute_result).reset_index(drop=True)
# 打印结果
print(df1)
这个代码会根据你的规则计算出Result
列的值,并生成所需的输出。
英文:
I have a dataframe:
df1 = pd.DataFrame({'Name': ['Bob', 'Bob', 'Bob', 'Joe', 'Joe', 'Joe', 'Alan', 'Alan', 'Steve', 'Steve'],
'ID': [1,2,3,4,5,6,7,8,9,10],
'Value': ['Y','Y','Y','N','N','N','Y','N','N','Y']})
Name ID Value
Bob 1 Y
Bob 2 Y
Bob 3 Y
Joe 4 N
Joe 5 N
Joe 6 N
Alan 7 Y
Alan 8 N
Steve 9 N
Steve 10 Y
I need to compute a new Result
column that has the following rule. For each group Name
so Bob, Joe, etc., if each Value
is 'Y', assign each value a Y in the new column. Otherwise, assign it a 'N'.
So ideal output is:
Name ID Value Result
Bob 1 Y Y
Bob 2 Y Y
Bob 3 Y Y
Joe 4 N N
Joe 5 N N
Joe 6 N N
Alan 7 Y N
Alan 8 N N
Steve 9 N N
Steve 10 Y N
This is what I have so far but doesn't work correctly.
df1['Result'] = df1.groupby('Name').Value.all().reindex(df1.Name).astype(str).values
df1
答案1
得分: 2
使用 numpy.where
与 GroupBy.transform
处理与原始大小相同的 Series
,以及 GroupBy.all
:
df1['Result'] = np.where(df1['Value'].eq('Y').groupby(df1['Name']).transform('all'), 'Y', 'N')
替代方法:
mask = df1['Value'].eq('Y').groupby(df1['Name']).transform('all')
df1.loc[~mask, 'Value'] = 'N'
或者获取至少具有 N
个值的所有组,并使用 Series.isin
根据 mask
替换为 N
:
mask = df1['Name'].isin(df1.loc[df1['Value'].eq('N'), 'Name'])
df1.loc[mask, 'Value'] = 'N'
print(df1)
Name ID Value
0 Bob 1 Y
1 Bob 2 Y
2 Bob 3 Y
3 Joe 4 N
4 Joe 5 N
5 Joe 6 N
6 Alan 7 N
7 Alan 8 N
8 Steve 9 N
9 Steve 10 N
英文:
Use numpy.where
with GroupBy.transform
for Series
with same size like original and GroupBy.all
:
df1['Result'] = np.where(df1['Value'].eq('Y').groupby(df1['Name']).transform('all'), 'Y', 'N')
Alternative:
mask = df1['Value'].eq('Y').groupby(df1['Name']).transform('all')
df1.loc[~mask, 'Value'] = 'N'
Or get all groups with at least N
and replace by N
by mask
with Series.isin
:
mask = df1['Name'].isin(df1.loc[df1['Value'].eq('N'), 'Name'])
df1.loc[mask, 'Value'] = 'N'
print (df1)
Name ID Value
0 Bob 1 Y
1 Bob 2 Y
2 Bob 3 Y
3 Joe 4 N
4 Joe 5 N
5 Joe 6 N
6 Alan 7 N
7 Alan 8 N
8 Steve 9 N
9 Steve 10 N
答案2
得分: 1
你快要成功了!以下是您可以这样做的方法:
df1["Result"] = df1.groupby("Name").Value.transform(lambda value: all(value == "Y"))
英文:
You were close! Here's how you could do it:
df1["Result"] = df1.groupby("Name").Value.transform(lambda value: all(value == "Y"))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论