在Pandas中创建一个新列,该列的值基于其他列的计数和固定的特定值。

huangapple go评论88阅读模式
英文:

Create a new column in Pandas based on count of other columns and a fixed specific value

问题

以下是您要翻译的内容:

"这是我另一个相关问题的延续:https://stackoverflow.com/questions/75439107/create-a-new-column-based-on-count-of-other-columns

我有一个看起来像这样的数据框:

  1. col_1 col_2 col_3
  2. 6 A 1
  3. 2 A 1
  4. 5 B 1
  5. 3 C 1
  6. 5 C 2
  7. 3 B 2
  8. 6 A 1
  9. 6 A 0
  10. 2 B 3
  11. 2 C 3
  12. 5 A 3
  13. 5 B 1

我想添加一个新列 col_new,该列计算具有与 col_1col_2 中相同元素的行的数量,但不包括该行本身,同时 col_3 中的元素为 1(不管 col_3 中的行元素实际上是 1 还是其他值)。所以期望的输出如下:

  1. col_1 col_2 col_3 col_new
  2. 6 A 1 1
  3. 2 A 1 0
  4. 5 B 1 1
  5. 3 C 1 0
  6. 5 C 2 0
  7. 3 B 2 0
  8. 6 A 1 1
  9. 6 A 0 1(即使 ```col_3``` 值为 0
  10. 2 B 3 0
  11. 2 C 3 0
  12. 5 A 3 0
  13. 5 B 1 1

我尝试过:

df['col_new'] = df[df['col_3'] == 1].groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)

这会显示对于那些具有 col_3 值为 1 的行的正确结果,但对于具有 col_3 值为 0 的行(如第 8 行),会显示 NaN

非常感谢您提前的帮助。"

英文:

This is a continuation of my another related question: https://stackoverflow.com/questions/75439107/create-a-new-column-based-on-count-of-other-columns

I have a dataframe that looks like

  1. col_1 col_2 col_3
  2. 6 A 1
  3. 2 A 1
  4. 5 B 1
  5. 3 C 1
  6. 5 C 2
  7. 3 B 2
  8. 6 A 1
  9. 6 A 0
  10. 2 B 3
  11. 2 C 3
  12. 5 A 3
  13. 5 B 1

and i want to add a new column col_new that counts the number of rows with the same elements in col_1 and col_2 but excluding that row itself and such that the element in col_3 is 1 (regardless of the row element in col_3 is actually 1 or not ). So the desired output would look like

  1. col_1 col_2 col_3 col_new
  2. 6 A 1 1
  3. 2 A 1 0
  4. 5 B 1 1
  5. 3 C 1 0
  6. 5 C 2 0
  7. 3 B 2 0
  8. 6 A 1 1
  9. 6 A 0 1 (even though ```col_3``` value is 0)
  10. 2 B 3 0
  11. 2 C 3 0
  12. 5 A 3 0
  13. 5 B 1 1

What I have tried:

df['col_new] = df[df['col_3' == 1]].groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)

which shows the correct result for those rows with col_3 value 1 but NaN for rows with col_3 value 0 (like row 8)

Thank you so much in advance.

答案1

得分: 1

以下是您要的代码的中文翻译:

  1. # 我相信您想要的是:
  2. df['col_new'] = (df.groupby(['col_1', 'col_2'])['col_3']
  3. .transform('sum').sub(df['col_3'])
  4. )

或者,如果只考虑1s(不是2s):

  1. s = df['col_3'].eq(1)
  2. df['col_new'] = (df.assign(col_3=s)
  3. .groupby(['col_1', 'col_2'])['col_3']
  4. .transform('sum').sub(s)
  5. )

输出:

  1. col_1 col_2 col_3 col_new
  2. 0 6 A 1 1
  3. 1 2 A 1 0
  4. 2 5 B 1 1
  5. 3 3 C 1 0
  6. 4 5 C 2 0
  7. 5 3 B 2 0
  8. 6 6 A 1 1
  9. 7 6 A 0 2 # 行1和6都匹配
  10. 8 2 B 3 0
  11. 9 2 C 3 0
  12. 10 5 A 3 0
  13. 11 5 B 1 1
英文:

I believe you want:

  1. df['col_new'] = (df.groupby(['col_1', 'col_2'])['col_3']
  2. .transform('sum').sub(df['col_3'])
  3. )

Or, to only consider 1s (not 2s):

  1. s = df['col_3'].eq(1)
  2. df['col_new'] = (df.assign(col_3=s)
  3. .groupby(['col_1', 'col_2'])['col_3']
  4. .transform('sum').sub(s)
  5. )

Output:

  1. col_1 col_2 col_3 col_new
  2. 0 6 A 1 1
  3. 1 2 A 1 0
  4. 2 5 B 1 1
  5. 3 3 C 1 0
  6. 4 5 C 2 0
  7. 5 3 B 2 0
  8. 6 6 A 1 1
  9. 7 6 A 0 2 # both rows 1 and 6 match
  10. 8 2 B 3 0
  11. 9 2 C 3 0
  12. 10 5 A 3 0
  13. 11 5 B 1 1
  14. ``
  15. </details>

huangapple
  • 本文由 发表于 2023年2月14日 03:02:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75440187.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定