2023年2月14日 03:02:55go评论88阅读模式

英文:

Create a new column in Pandas based on count of other columns and a fixed specific value

问题

以下是您要翻译的内容：

"这是我另一个相关问题的延续：https://stackoverflow.com/questions/75439107/create-a-new-column-based-on-count-of-other-columns

我有一个看起来像这样的数据框：

col_1   col_2   col_3
6       A       1
2       A       1 
5       B       1
3       C       1
5       C       2
3       B       2
6       A       1
6       A       0
2       B       3
2       C       3
5       A       3
5       B       1

我想添加一个新列 col_new，该列计算具有与 col_1 和 col_2 中相同元素的行的数量，但不包括该行本身，同时 col_3 中的元素为 1（不管 col_3 中的行元素实际上是 1 还是其他值）。所以期望的输出如下：

col_1   col_2   col_3   col_new
6       A       1       1
2       A       1       0
5       B       1       1
3       C       1       0
5       C       2       0
3       B       2       0
6       A       1       1
6       A       0       1（即使 ```col_3``` 值为 0）
2       B       3       0
2       C       3       0
5       A       3       0
5       B       1       1

我尝试过：

df['col_new'] = df[df['col_3'] == 1].groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)

这会显示对于那些具有 col_3 值为 1 的行的正确结果，但对于具有 col_3 值为 0 的行（如第 8 行），会显示 NaN

非常感谢您提前的帮助。"

英文:

This is a continuation of my another related question: https://stackoverflow.com/questions/75439107/create-a-new-column-based-on-count-of-other-columns

I have a dataframe that looks like

col_1   col_2   col_3
6       A       1
2       A       1 
5       B       1
3       C       1
5       C       2
3       B       2
6       A       1
6       A       0
2       B       3
2       C       3
5       A       3
5       B       1

and i want to add a new column col_new that counts the number of rows with the same elements in col_1 and col_2 but excluding that row itself and such that the element in col_3 is 1 (regardless of the row element in col_3 is actually 1 or not ). So the desired output would look like

col_1   col_2   col_3   col_new
6       A       1       1
2       A       1       0
5       B       1       1
3       C       1       0
5       C       2       0
3       B       2       0
6       A       1       1
6       A       0       1 (even though ```col_3``` value is 0)
2       B       3       0
2       C       3       0
5       A       3       0
5       B       1       1

What I have tried:

df['col_new] = df[df['col_3' == 1]].groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)

which shows the correct result for those rows with col_3 value 1 but NaN for rows with col_3 value 0 (like row 8)

Thank you so much in advance.

答案1

得分: 1

以下是您要的代码的中文翻译：

# 我相信您想要的是：
df['col_new'] = (df.groupby(['col_1', 'col_2'])['col_3']
                   .transform('sum').sub(df['col_3'])
                 )

或者，如果只考虑1s（不是2s）：

s = df['col_3'].eq(1)
df['col_new'] = (df.assign(col_3=s)
                   .groupby(['col_1', 'col_2'])['col_3']
                   .transform('sum').sub(s)
                 )

输出：

    col_1 col_2  col_3  col_new
0       6     A      1        1
1       2     A      1        0
2       5     B      1        1
3       3     C      1        0
4       5     C      2        0
5       3     B      2        0
6       6     A      1        1
7       6     A      0        2  # 行1和6都匹配
8       2     B      3        0
9       2     C      3        0
10      5     A      3        0
11      5     B      1        1

英文:

I believe you want:

df[&#39;col_new&#39;] = (df.groupby([&#39;col_1&#39;, &#39;col_2&#39;])[&#39;col_3&#39;]
                   .transform(&#39;sum&#39;).sub(df[&#39;col_3&#39;])
                 )

Or, to only consider 1s (not 2s):

s = df[&#39;col_3&#39;].eq(1)
df[&#39;col_new&#39;] = (df.assign(col_3=s)
                   .groupby([&#39;col_1&#39;, &#39;col_2&#39;])[&#39;col_3&#39;]
                   .transform(&#39;sum&#39;).sub(s)
                 )

Output:

    col_1 col_2  col_3  col_new
0       6     A      1        1
1       2     A      1        0
2       5     B      1        1
3       3     C      1        0
4       5     C      2        0
5       3     B      2        0
6       6     A      1        1
7       6     A      0        2  # both rows 1 and 6 match
8       2     B      3        0
9       2     C      3        0
10      5     A      3        0
11      5     B      1        1
``
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Pandas中创建一个新列，该列的值基于其他列的计数和固定的特定值。

问题

答案1

如何在Optuna中记录交叉验证中每个折叠的验证损失？

Python猜数字游戏练习 – 对描述和数学部分感到困惑。

How to use documentation of azure DevOps python API, I am trying to get what members an object has when API call is made?

RuntimeError: Given groups=1, weight of size [128, 55, 11, 11], expected input[64, 57, 28, 28] to have 55 channels, but got 57 channels instead

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。