在Pandas中创建一个新列,该列的值基于其他列的计数和固定的特定值。

huangapple go评论62阅读模式
英文:

Create a new column in Pandas based on count of other columns and a fixed specific value

问题

以下是您要翻译的内容:

"这是我另一个相关问题的延续:https://stackoverflow.com/questions/75439107/create-a-new-column-based-on-count-of-other-columns

我有一个看起来像这样的数据框:

col_1   col_2   col_3
6       A       1
2       A       1 
5       B       1
3       C       1
5       C       2
3       B       2
6       A       1
6       A       0
2       B       3
2       C       3
5       A       3
5       B       1

我想添加一个新列 col_new,该列计算具有与 col_1col_2 中相同元素的行的数量,但不包括该行本身,同时 col_3 中的元素为 1(不管 col_3 中的行元素实际上是 1 还是其他值)。所以期望的输出如下:

col_1   col_2   col_3   col_new
6       A       1       1
2       A       1       0
5       B       1       1
3       C       1       0
5       C       2       0
3       B       2       0
6       A       1       1
6       A       0       1(即使 ```col_3``` 值为 0)
2       B       3       0
2       C       3       0
5       A       3       0
5       B       1       1

我尝试过:

df['col_new'] = df[df['col_3'] == 1].groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)

这会显示对于那些具有 col_3 值为 1 的行的正确结果,但对于具有 col_3 值为 0 的行(如第 8 行),会显示 NaN

非常感谢您提前的帮助。"

英文:

This is a continuation of my another related question: https://stackoverflow.com/questions/75439107/create-a-new-column-based-on-count-of-other-columns

I have a dataframe that looks like

col_1   col_2   col_3
6       A       1
2       A       1 
5       B       1
3       C       1
5       C       2
3       B       2
6       A       1
6       A       0
2       B       3
2       C       3
5       A       3
5       B       1

and i want to add a new column col_new that counts the number of rows with the same elements in col_1 and col_2 but excluding that row itself and such that the element in col_3 is 1 (regardless of the row element in col_3 is actually 1 or not ). So the desired output would look like

col_1   col_2   col_3   col_new
6       A       1       1
2       A       1       0
5       B       1       1
3       C       1       0
5       C       2       0
3       B       2       0
6       A       1       1
6       A       0       1 (even though ```col_3``` value is 0)
2       B       3       0
2       C       3       0
5       A       3       0
5       B       1       1

What I have tried:

df['col_new] = df[df['col_3' == 1]].groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)

which shows the correct result for those rows with col_3 value 1 but NaN for rows with col_3 value 0 (like row 8)

Thank you so much in advance.

答案1

得分: 1

以下是您要的代码的中文翻译:

# 我相信您想要的是:
df['col_new'] = (df.groupby(['col_1', 'col_2'])['col_3']
                   .transform('sum').sub(df['col_3'])
                 )

或者,如果只考虑1s(不是2s):

s = df['col_3'].eq(1)
df['col_new'] = (df.assign(col_3=s)
                   .groupby(['col_1', 'col_2'])['col_3']
                   .transform('sum').sub(s)
                 )

输出:

    col_1 col_2  col_3  col_new
0       6     A      1        1
1       2     A      1        0
2       5     B      1        1
3       3     C      1        0
4       5     C      2        0
5       3     B      2        0
6       6     A      1        1
7       6     A      0        2  # 行1和6都匹配
8       2     B      3        0
9       2     C      3        0
10      5     A      3        0
11      5     B      1        1
英文:

I believe you want:

df['col_new'] = (df.groupby(['col_1', 'col_2'])['col_3']
                   .transform('sum').sub(df['col_3'])
                 )

Or, to only consider 1s (not 2s):

s = df['col_3'].eq(1)
df['col_new'] = (df.assign(col_3=s)
                   .groupby(['col_1', 'col_2'])['col_3']
                   .transform('sum').sub(s)
                 )

Output:

    col_1 col_2  col_3  col_new
0       6     A      1        1
1       2     A      1        0
2       5     B      1        1
3       3     C      1        0
4       5     C      2        0
5       3     B      2        0
6       6     A      1        1
7       6     A      0        2  # both rows 1 and 6 match
8       2     B      3        0
9       2     C      3        0
10      5     A      3        0
11      5     B      1        1
``

</details>



huangapple
  • 本文由 发表于 2023年2月14日 03:02:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75440187.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定