2023年2月14日 01:13:33go评论91阅读模式

英文:

Create a new column based on count of other columns

问题

我有一个pandas数据框，看起来像这样

col_1 col_2
6 A
2 A
5 B
3 C
5 C
3 B
6 A
6 A
2 B
2 C
5 A
5 B

我想要添加一个新列```col_new```，该列统计与```col_1```和```col_2```中相同元素的行数，但不包括该行本身。所以期望的输出应该是

col_1 col_2 col_new
6 A 2
2 A 0
5 B 1
3 C 0
5 C 0
3 B 0
6 A 2
6 A 2
2 B 0
2 C 0
5 A 0
5 B 1

这是我的尝试，但我不确定是否是正确的方法：
```df['col_new'] = df.groupby(['col_1', 'col_2']).col_1.transform('size') - 1```

英文:

I have a dataframe in pandas that looks like

col_1   col_2
6       A       
2       A       
5       B       
3       C       
5       C       
3       B       
6       A       
6       A       
2       B       
2       C       
5       A       
5       B

and i want to add a new column col_new that counts the number of rows with the same elements in col_1 and col_2 but excluding that row itself. So the desired output would look like

col_1   col_2   col_new
6       A       2
2       A       0
5       B       1
3       C       0  
5       C       0
3       B       0
6       A       2
6       A       2
2       B       0
2       C       0
5       A       0
5       B       1

Here what's I tried but I am not sure if it's the right approach:

df['col_new'] = df.groupby(['col_1', 'col_2']).count()

But then I got the error: TypeError: incompatible index of inserted column with frame index

Thanks in advance.

答案1

得分: 3

您可以使用：

df['col_new'] = df.groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)

输出结果：

col_1 col_2  col_new

0 6 A 2
1 2 A 0
2 5 B 1
3 3 C 0
4 5 C 0
5 3 B 0
6 6 A 2
7 6 A 2
8 2 B 0
9 2 C 0
10 5 A 0
11 5 B 1


<details>
<summary>英文:</summary>
You can use:

df['col_new'] = df.groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)

Output:

col_1 col_2  col_new

0 6 A 2
1 2 A 0
2 5 B 1
3 3 C 0
4 5 C 0
5 3 B 0
6 6 A 2
7 6 A 2
8 2 B 0
9 2 C 0
10 5 A 0
11 5 B 1


</details>
# 答案2
**得分**: 2
- 使用`value_counts`方法。
  ```python
  df["col3"] = df.apply(lambda x: (x[0], x[1]), axis=1)

       col_1 col_2    col3                                                                                              
0       6     A    (6, A)                                                                                              
1       2     A    (2, A)                                                                                              
2       5     B    (5, B)                                                                                              
3       3     C    (3, C)                                                                                              
4       5     C    (5, C)                                                                                              
5       3     B    (3, B)                                                                                              
6       6     A    (6, A)                                                                                              
7       6     A    (6, A)                                                                                              
8       2     B    (2, B)                                                                                              
9       2     C    (2, C)                                                                                              
10      5     A    (5, A)                                                                                              
11      5     B    (5, B)

创建一个值计数的Series。这将被用作查找表。

value_counts = df["col3"].value_counts()

(6, A)    3
(5, B)    2
(2, A)    1
(3, C)    1
(5, C)    1
(3, B)    1
(2, B)    1
(2, C)    1
(5, A)    1
Name: col3, dtype: int64

将每行映射到第四列名为counts。

df["counts"] = df["col3"].map(value_counts)

       col_1 col_2    col3  counts
0       6     A  (6, A)       3
1       2     A  (2, A)       1
2       5     B  (5, B)       2
3       3     C  (3, C)       1
4       5     C  (5, C)       1
5       3     B  (3, B)       1
6       6     A  (6, A)       3
7       6     A  (6, A)       3
8       2     B  (2, B)       1
9       2     C  (2, C)       1
10      5     A  (5, A)       1
11      5     B  (5, B)       2

英文:

I would use the value_counts method.

Create a 3rd column called col3 and store a tuple of the row values. Tuples, unlike lists are hashable and can be used to create keys for counting.

df[&quot;col3&quot;] = df.apply(lambda x: (x[0], x[1]), axis = 1)

       col_1 col_2    col3                                                                                              
  0       6     A    (6, A)                                                                                              
  1       2     A    (2, A)                                                                                              
  2       5     B    (5, B)                                                                                             
  3       3     C    (3, C)                                                                                              
  4       5     C    (5, C)                                                                                              
  5       3     B    (3, B)                                                                                              
  6       6     A    (6, A)                                                                                              
  7       6     A    (6, A)                                                                                              
  8       2     B    (2, B)                                                                                              
  9       2     C    (2, C)                                                                                              
  10      5     A    (5, A)                                                                                              
  11      5     B    (5, B)

Create a Series for value counts. This will be used like a lookup table.

value_counts = df[&quot;col3&quot;].value_counts()

(6, A)    3
(5, B)    2
(2, A)    1
(3, C)    1
(5, C)    1
(3, B)    1
(2, B)    1
(2, C)    1
(5, A)    1
Name: col3, dtype: int64

Map each row to a fourth column called counts

df[&quot;counts&quot;] = df[&quot;col3&quot;].map(value_counts)

       col_1 col_2    col3  counts
  0       6     A  (6, A)       3
  1       2     A  (2, A)       1
  2       5     B  (5, B)       2
  3       3     C  (3, C)       1
  4       5     C  (5, C)       1
  5       3     B  (3, B)       1
  6       6     A  (6, A)       3
  7       6     A  (6, A)       3
  8       2     B  (2, B)       1
  9       2     C  (2, C)       1
  10      5     A  (5, A)       1
  11      5     B  (5, B)       2

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建一个基于其他列计数的新列。

问题

答案1

将一个Spark DataFrame收集到一个单独的行中。

ZAP API扫描Android和iOS设备。

在QAbstractTableModel中删除多行的RemoveRow函数和刷新 – Pyside2

我遇到了在Python中使用PLY创建词法分析器的问题。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。