英文:
Create a new column in Pandas based on count of other columns and a fixed specific value
问题
以下是您要翻译的内容:
"这是我另一个相关问题的延续:https://stackoverflow.com/questions/75439107/create-a-new-column-based-on-count-of-other-columns
我有一个看起来像这样的数据框:
col_1 col_2 col_3
6 A 1
2 A 1
5 B 1
3 C 1
5 C 2
3 B 2
6 A 1
6 A 0
2 B 3
2 C 3
5 A 3
5 B 1
我想添加一个新列 col_new
,该列计算具有与 col_1
和 col_2
中相同元素的行的数量,但不包括该行本身,同时 col_3
中的元素为 1(不管 col_3
中的行元素实际上是 1
还是其他值)。所以期望的输出如下:
col_1 col_2 col_3 col_new
6 A 1 1
2 A 1 0
5 B 1 1
3 C 1 0
5 C 2 0
3 B 2 0
6 A 1 1
6 A 0 1(即使 ```col_3``` 值为 0)
2 B 3 0
2 C 3 0
5 A 3 0
5 B 1 1
我尝试过:
df['col_new'] = df[df['col_3'] == 1].groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)
这会显示对于那些具有 col_3
值为 1
的行的正确结果,但对于具有 col_3
值为 0
的行(如第 8 行),会显示 NaN
非常感谢您提前的帮助。"
英文:
This is a continuation of my another related question: https://stackoverflow.com/questions/75439107/create-a-new-column-based-on-count-of-other-columns
I have a dataframe that looks like
col_1 col_2 col_3
6 A 1
2 A 1
5 B 1
3 C 1
5 C 2
3 B 2
6 A 1
6 A 0
2 B 3
2 C 3
5 A 3
5 B 1
and i want to add a new column col_new
that counts the number of rows with the same elements in col_1
and col_2
but excluding that row itself and such that the element in col_3
is 1 (regardless of the row element in col_3
is actually 1
or not ). So the desired output would look like
col_1 col_2 col_3 col_new
6 A 1 1
2 A 1 0
5 B 1 1
3 C 1 0
5 C 2 0
3 B 2 0
6 A 1 1
6 A 0 1 (even though ```col_3``` value is 0)
2 B 3 0
2 C 3 0
5 A 3 0
5 B 1 1
What I have tried:
df['col_new] = df[df['col_3' == 1]].groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)
which shows the correct result for those rows with col_3
value 1
but NaN
for rows with col_3
value 0
(like row 8)
Thank you so much in advance.
答案1
得分: 1
以下是您要的代码的中文翻译:
# 我相信您想要的是:
df['col_new'] = (df.groupby(['col_1', 'col_2'])['col_3']
.transform('sum').sub(df['col_3'])
)
或者,如果只考虑1s(不是2s):
s = df['col_3'].eq(1)
df['col_new'] = (df.assign(col_3=s)
.groupby(['col_1', 'col_2'])['col_3']
.transform('sum').sub(s)
)
输出:
col_1 col_2 col_3 col_new
0 6 A 1 1
1 2 A 1 0
2 5 B 1 1
3 3 C 1 0
4 5 C 2 0
5 3 B 2 0
6 6 A 1 1
7 6 A 0 2 # 行1和6都匹配
8 2 B 3 0
9 2 C 3 0
10 5 A 3 0
11 5 B 1 1
英文:
I believe you want:
df['col_new'] = (df.groupby(['col_1', 'col_2'])['col_3']
.transform('sum').sub(df['col_3'])
)
Or, to only consider 1s (not 2s):
s = df['col_3'].eq(1)
df['col_new'] = (df.assign(col_3=s)
.groupby(['col_1', 'col_2'])['col_3']
.transform('sum').sub(s)
)
Output:
col_1 col_2 col_3 col_new
0 6 A 1 1
1 2 A 1 0
2 5 B 1 1
3 3 C 1 0
4 5 C 2 0
5 3 B 2 0
6 6 A 1 1
7 6 A 0 2 # both rows 1 and 6 match
8 2 B 3 0
9 2 C 3 0
10 5 A 3 0
11 5 B 1 1
``
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论