Pandas groupby,找到匹配并写回列。

huangapple go评论92阅读模式
英文:

Pandas groupby, find match and write back to column

问题

在数据框 data 中,我想按 'Name' 分组,找到 'Price1' 和 'Price2' 相等的地方,然后根据分组 'Name' 将值写入 'answer' 列。例如:

  1. d = {
  2. 'Name': ['Cat', 'Cat', 'Dog', 'Dog'],
  3. 'Price1': [2, 1, 10, 3],
  4. 'Price2':[5,1,7,3],
  5. 'answer':['A','B','C','D']
  6. }
  7. data = pd.DataFrame(data=d)
  8. Name Price1 Price2 Answer
  9. 0 Cat 2 5 A
  10. 1. Cat 1 1 B <--- 匹配,获取 'B'
  11. 2. Dog 10 7 C
  12. 3. Dog 3 3 D <---- 匹配,获取 'D'

类似于这样的代码:

  1. data['result'] = data.groupby('Name')['answer'].transform(lambda x: x[data['Price1'] == data['Price2']])

预期结果是第2行 (1=1) 和第4行 (3=3) 各自匹配并查找 'answer' 列 'B' 和 'D',因此结果是:

  1. data['result']
  2. 0 'B'
  3. 1 'B'
  4. 2 'D'
  5. 3 'D'

你已经很接近正确的解决方案,只需要使用正确的 lambda 函数来获取匹配的 'answer' 值。

英文:

In the dataframe data, I want to groupby 'Name', find where "Price1" and "Price2" are equal and then write the values in 'answer' to a new column with respect to groupby 'Name'. ex:

  1. d = {
  2. &#39;Name&#39;: [&#39;Cat&#39;, &#39;Cat&#39;, &#39;Dog&#39;, &#39;Dog&#39;],
  3. &#39;Price1&#39;: [2, 1, 10, 3],
  4. &#39;Price2&#39;:[5,1,7,3],
  5. &#39;answer&#39;:[&#39;A&#39;,&#39;B&#39;,&#39;C&#39;,&#39;D&#39;]
  6. }
  7. data = pd.DataFrame(data=d)
  8. Name Price1 Price2 Answer
  9. 0 Cat 2 5 A
  10. 1. Cat 1 1 B &lt;--- match, get &#39;B&#39;
  11. 2. Dog 10 7 C
  12. 3. Dog 3 3 D &lt;---- match, get &#39;D&#39;

something like this

  1. data[&#39;result&#39;] = data.groupby(&#39;itemName&#39;)[&#39;answer&#39;] where [data[&#39;Price1&#39;]=data[&#39;Price2&#39;] #&lt;---- this is the part I need equation.

and expect 2nd (1=1) and 4th (3&3) rows each match and lookup 'answer' column 'B' and 'D', so result is:

  1. data[&#39;result&#39;]
  2. 0 &#39;B&#39;
  3. 1 &#39;B&#39;
  4. 2 &#39;D&#39;
  5. 3 &#39;D&#39;

I've tried something like this

  1. data.groupby(&#39;itemName&#39;)[&#39;Price1&#39;].transform(x:data[&#39;answer&#39;][x==data[&#39;Price2&#39;]],

which gives error
>ValueError: Can only compare identically-labeled Series objects

and tried this not even using x.

  1. data.groupby(&#39;itemName&#39;)[&#39;Price1&#39;].transform(x:data[&#39;answer&#39;][data[&#39;Price1&#39;]==data[&#39;Price2&#39;]],

result only applies to the matched indices:

  1. data[&#39;result&#39;]
  2. 0 NaN
  3. 1 &#39;B&#39;
  4. 2 NaN
  5. 3 &#39;D&#39;

I think I am close but missing the key concept.

答案1

得分: 4

IIUC,

  1. df.loc[df['Price1'] == df['Price2'], 'result'] = df['answer']
  2. df['result'] = df.groupby('Name')['result'].transform('first')
  3. print(df)

Output:

  1. Name Price1 Price2 answer result
  2. 0 Cat 2 5 A B
  3. 1 Cat 1 1 B B
  4. 2 Dog 10 7 C D
  5. 3 Dog 3 3 D D
英文:

IIUC,

  1. df.loc[df[&#39;Price1&#39;] == df[&#39;Price2&#39;], &#39;result&#39;] = df[&#39;answer&#39;]
  2. df[&#39;result&#39;] = df.groupby(&#39;Name&#39;)[&#39;result&#39;].transform(&#39;first&#39;)
  3. print(df)

Output:

  1. Name Price1 Price2 answer result
  2. 0 Cat 2 5 A B
  3. 1 Cat 1 1 B B
  4. 2 Dog 10 7 C D
  5. 3 Dog 3 3 D D

答案2

得分: 0

你也可以在 groupby.apply 中执行查询和选择操作。

  1. out = (df.groupby('Name', as_index=False, group_keys=False)
  2. .apply(lambda df_: df_.assign(result=df_.query('Price1 == Price2').eval('answer').item())))
  1. print(out)
  2. Name Price1 Price2 answer result
  3. 0 Cat 2 5 A B
  4. 1 Cat 1 1 B B
  5. 2 Dog 10 7 C D
  6. 3 Dog 3 3 D D
英文:

You can also do the query and select operation in groupby.apply

  1. out = (df.groupby(&#39;Name&#39;, as_index=False, group_keys=False)
  2. .apply(lambda df_: df_.assign(result=df_.query(&#39;Price1 == Price2&#39;).eval(&#39;answer&#39;).item())))
  1. print(out)
  2. Name Price1 Price2 answer result
  3. 0 Cat 2 5 A B
  4. 1 Cat 1 1 B B
  5. 2 Dog 10 7 C D
  6. 3 Dog 3 3 D D

huangapple
  • 本文由 发表于 2023年2月18日 11:42:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/75491025.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定