英文:
Pandas groupby, find match and write back to column
问题
在数据框 data 中,我想按 'Name' 分组,找到 'Price1' 和 'Price2' 相等的地方,然后根据分组 'Name' 将值写入 'answer' 列。例如:
d = {
'Name': ['Cat', 'Cat', 'Dog', 'Dog'],
'Price1': [2, 1, 10, 3],
'Price2':[5,1,7,3],
'answer':['A','B','C','D']
}
data = pd.DataFrame(data=d)
Name Price1 Price2 Answer
0 Cat 2 5 A
1. Cat 1 1 B <--- 匹配,获取 'B'
2. Dog 10 7 C
3. Dog 3 3 D <---- 匹配,获取 'D'
类似于这样的代码:
data['result'] = data.groupby('Name')['answer'].transform(lambda x: x[data['Price1'] == data['Price2']])
预期结果是第2行 (1=1) 和第4行 (3=3) 各自匹配并查找 'answer' 列 'B' 和 'D',因此结果是:
data['result']
0 'B'
1 'B'
2 'D'
3 'D'
你已经很接近正确的解决方案,只需要使用正确的 lambda 函数来获取匹配的 'answer' 值。
英文:
In the dataframe data, I want to groupby 'Name', find where "Price1" and "Price2" are equal and then write the values in 'answer' to a new column with respect to groupby 'Name'. ex:
d = {
'Name': ['Cat', 'Cat', 'Dog', 'Dog'],
'Price1': [2, 1, 10, 3],
'Price2':[5,1,7,3],
'answer':['A','B','C','D']
}
data = pd.DataFrame(data=d)
Name Price1 Price2 Answer
0 Cat 2 5 A
1. Cat 1 1 B <--- match, get 'B'
2. Dog 10 7 C
3. Dog 3 3 D <---- match, get 'D'
something like this
data['result'] = data.groupby('itemName')['answer'] where [data['Price1']=data['Price2'] #<---- this is the part I need equation.
and expect 2nd (1=1) and 4th (3&3) rows each match and lookup 'answer' column 'B' and 'D', so result is:
data['result']
0 'B'
1 'B'
2 'D'
3 'D'
I've tried something like this
data.groupby('itemName')['Price1'].transform(x:data['answer'][x==data['Price2']],
which gives error
>ValueError: Can only compare identically-labeled Series objects
and tried this not even using x.
data.groupby('itemName')['Price1'].transform(x:data['answer'][data['Price1']==data['Price2']],
result only applies to the matched indices:
data['result']
0 NaN
1 'B'
2 NaN
3 'D'
I think I am close but missing the key concept.
答案1
得分: 4
IIUC,
df.loc[df['Price1'] == df['Price2'], 'result'] = df['answer']
df['result'] = df.groupby('Name')['result'].transform('first')
print(df)
Output:
Name Price1 Price2 answer result
0 Cat 2 5 A B
1 Cat 1 1 B B
2 Dog 10 7 C D
3 Dog 3 3 D D
英文:
IIUC,
df.loc[df['Price1'] == df['Price2'], 'result'] = df['answer']
df['result'] = df.groupby('Name')['result'].transform('first')
print(df)
Output:
Name Price1 Price2 answer result
0 Cat 2 5 A B
1 Cat 1 1 B B
2 Dog 10 7 C D
3 Dog 3 3 D D
答案2
得分: 0
你也可以在 groupby.apply
中执行查询和选择操作。
out = (df.groupby('Name', as_index=False, group_keys=False)
.apply(lambda df_: df_.assign(result=df_.query('Price1 == Price2').eval('answer').item())))
print(out)
Name Price1 Price2 answer result
0 Cat 2 5 A B
1 Cat 1 1 B B
2 Dog 10 7 C D
3 Dog 3 3 D D
英文:
You can also do the query and select operation in groupby.apply
out = (df.groupby('Name', as_index=False, group_keys=False)
.apply(lambda df_: df_.assign(result=df_.query('Price1 == Price2').eval('answer').item())))
print(out)
Name Price1 Price2 answer result
0 Cat 2 5 A B
1 Cat 1 1 B B
2 Dog 10 7 C D
3 Dog 3 3 D D
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论