2023年2月18日 11:42:00go评论92阅读模式

英文:

Pandas groupby, find match and write back to column

问题

在数据框 data 中，我想按 'Name' 分组，找到 'Price1' 和 'Price2' 相等的地方，然后根据分组 'Name' 将值写入 'answer' 列。例如：

d = {
    'Name': ['Cat', 'Cat', 'Dog', 'Dog'],
    'Price1': [2, 1, 10, 3],
    'Price2':[5,1,7,3],
    'answer':['A','B','C','D']
}
data = pd.DataFrame(data=d) 
    Name Price1 Price2 Answer
0   Cat   2     5      A
1.  Cat   1     1      B        <--- 匹配，获取 'B'
2.  Dog   10    7      C
3.  Dog   3     3      D        <---- 匹配，获取 'D'

类似于这样的代码：

data['result'] = data.groupby('Name')['answer'].transform(lambda x: x[data['Price1'] == data['Price2']])

预期结果是第2行 (1=1) 和第4行 (3=3) 各自匹配并查找 'answer' 列 'B' 和 'D'，因此结果是：

data['result'] 
0   'B'
1   'B'
2   'D'
3   'D'

你已经很接近正确的解决方案，只需要使用正确的 lambda 函数来获取匹配的 'answer' 值。

英文:

In the dataframe data, I want to groupby 'Name', find where "Price1" and "Price2" are equal and then write the values in 'answer' to a new column with respect to groupby 'Name'. ex:

d = {
    &#39;Name&#39;: [&#39;Cat&#39;, &#39;Cat&#39;, &#39;Dog&#39;, &#39;Dog&#39;],
    &#39;Price1&#39;: [2, 1, 10, 3],
    &#39;Price2&#39;:[5,1,7,3],
    &#39;answer&#39;:[&#39;A&#39;,&#39;B&#39;,&#39;C&#39;,&#39;D&#39;]
}
data = pd.DataFrame(data=d) 
    Name Price1 Price2 Answer
0   Cat   2     5      A
1.  Cat   1     1      B        &lt;--- match, get &#39;B&#39;
2.  Dog   10    7      C
3.  Dog   3     3      D        &lt;---- match, get &#39;D&#39;

something like this

data[&#39;result&#39;] = data.groupby(&#39;itemName&#39;)[&#39;answer&#39;] where [data[&#39;Price1&#39;]=data[&#39;Price2&#39;] #&lt;---- this is the part I need equation.

and expect 2nd (1=1) and 4th (3&3) rows each match and lookup 'answer' column 'B' and 'D', so result is:

data[&#39;result&#39;] 
0   &#39;B&#39;
1   &#39;B&#39;
2   &#39;D&#39;
3   &#39;D&#39;

I've tried something like this

data.groupby(&#39;itemName&#39;)[&#39;Price1&#39;].transform(x:data[&#39;answer&#39;][x==data[&#39;Price2&#39;]],

which gives error
>ValueError: Can only compare identically-labeled Series objects

and tried this not even using x.

data.groupby(&#39;itemName&#39;)[&#39;Price1&#39;].transform(x:data[&#39;answer&#39;][data[&#39;Price1&#39;]==data[&#39;Price2&#39;]],

result only applies to the matched indices:

data[&#39;result&#39;] 
0   NaN
1   &#39;B&#39;
2   NaN
3   &#39;D&#39;

I think I am close but missing the key concept.

答案1

得分: 4

IIUC,

df.loc[df['Price1'] == df['Price2'], 'result'] = df['answer']
df['result'] = df.groupby('Name')['result'].transform('first')
print(df)

Output:

    Name  Price1  Price2 answer result
0   Cat       2       5      A      B
1   Cat       1       1      B      B
2   Dog      10       7      C      D
3   Dog       3       3      D      D

英文:

IIUC,

df.loc[df[&#39;Price1&#39;] == df[&#39;Price2&#39;], &#39;result&#39;] = df[&#39;answer&#39;]
df[&#39;result&#39;] = df.groupby(&#39;Name&#39;)[&#39;result&#39;].transform(&#39;first&#39;)
print(df)

Output:

  Name  Price1  Price2 answer result
0  Cat       2       5      A      B
1  Cat       1       1      B      B
2  Dog      10       7      C      D
3  Dog       3       3      D      D

答案2

得分: 0

你也可以在 groupby.apply 中执行查询和选择操作。

out = (df.groupby('Name', as_index=False, group_keys=False)
       .apply(lambda df_: df_.assign(result=df_.query('Price1 == Price2').eval('answer').item())))

print(out)
  Name  Price1  Price2 answer result
0  Cat       2       5      A      B
1  Cat       1       1      B      B
2  Dog      10       7      C      D
3  Dog       3       3      D      D

英文:

You can also do the query and select operation in groupby.apply

out = (df.groupby(&#39;Name&#39;, as_index=False, group_keys=False)
       .apply(lambda df_: df_.assign(result=df_.query(&#39;Price1 == Price2&#39;).eval(&#39;answer&#39;).item())))

print(out)
  Name  Price1  Price2 answer result
0  Cat       2       5      A      B
1  Cat       1       1      B      B
2  Dog      10       7      C      D
3  Dog       3       3      D      D

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas groupby，找到匹配并写回列。

问题

答案1

答案2

pip 无法被 VS Code 找到。

忘记本地 wandb 邮箱

如何使用Python将多个不连续的单元格添加到Excel中的名称管理器？

使用数据矩阵作为基准来获取旋转角度。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。