英文:
Create a new column based on count of other columns
问题
我有一个pandas数据框,看起来像这样
col_1 col_2
6 A
2 A
5 B
3 C
5 C
3 B
6 A
6 A
2 B
2 C
5 A
5 B
我想要添加一个新列```col_new```,该列统计与```col_1```和```col_2```中相同元素的行数,但不包括该行本身。所以期望的输出应该是
col_1 col_2 col_new
6 A 2
2 A 0
5 B 1
3 C 0
5 C 0
3 B 0
6 A 2
6 A 2
2 B 0
2 C 0
5 A 0
5 B 1
这是我的尝试,但我不确定是否是正确的方法:
```df['col_new'] = df.groupby(['col_1', 'col_2']).col_1.transform('size') - 1```
英文:
I have a dataframe in pandas that looks like
col_1 col_2
6 A
2 A
5 B
3 C
5 C
3 B
6 A
6 A
2 B
2 C
5 A
5 B
and i want to add a new column col_new
that counts the number of rows with the same elements in col_1
and col_2
but excluding that row itself. So the desired output would look like
col_1 col_2 col_new
6 A 2
2 A 0
5 B 1
3 C 0
5 C 0
3 B 0
6 A 2
6 A 2
2 B 0
2 C 0
5 A 0
5 B 1
Here what's I tried but I am not sure if it's the right approach:
df['col_new'] = df.groupby(['col_1', 'col_2']).count()
But then I got the error: TypeError: incompatible index of inserted column with frame index
Thanks in advance.
答案1
得分: 3
您可以使用:
df['col_new'] = df.groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)
输出结果:
col_1 col_2 col_new
0 6 A 2
1 2 A 0
2 5 B 1
3 3 C 0
4 5 C 0
5 3 B 0
6 6 A 2
7 6 A 2
8 2 B 0
9 2 C 0
10 5 A 0
11 5 B 1
<details>
<summary>英文:</summary>
You can use:
df['col_new'] = df.groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)
Output:
col_1 col_2 col_new
0 6 A 2
1 2 A 0
2 5 B 1
3 3 C 0
4 5 C 0
5 3 B 0
6 6 A 2
7 6 A 2
8 2 B 0
9 2 C 0
10 5 A 0
11 5 B 1
</details>
# 答案2
**得分**: 2
- 使用`value_counts`方法。
```python
df["col3"] = df.apply(lambda x: (x[0], x[1]), axis=1)
col_1 col_2 col3
0 6 A (6, A)
1 2 A (2, A)
2 5 B (5, B)
3 3 C (3, C)
4 5 C (5, C)
5 3 B (3, B)
6 6 A (6, A)
7 6 A (6, A)
8 2 B (2, B)
9 2 C (2, C)
10 5 A (5, A)
11 5 B (5, B)
-
创建一个值计数的
Series
。这将被用作查找表。value_counts = df["col3"].value_counts()
(6, A) 3 (5, B) 2 (2, A) 1 (3, C) 1 (5, C) 1 (3, B) 1 (2, B) 1 (2, C) 1 (5, A) 1 Name: col3, dtype: int64
-
将每行映射到第四列名为
counts
。df["counts"] = df["col3"].map(value_counts)
col_1 col_2 col3 counts 0 6 A (6, A) 3 1 2 A (2, A) 1 2 5 B (5, B) 2 3 3 C (3, C) 1 4 5 C (5, C) 1 5 3 B (3, B) 1 6 6 A (6, A) 3 7 6 A (6, A) 3 8 2 B (2, B) 1 9 2 C (2, C) 1 10 5 A (5, A) 1 11 5 B (5, B) 2
英文:
I would use the value_counts
method.
-
Create a 3rd column called
col3
and store a tuple of the row values. Tuples, unlike lists are hashable and can be used to create keys for counting.df["col3"] = df.apply(lambda x: (x[0], x[1]), axis = 1)
col_1 col_2 col3 0 6 A (6, A) 1 2 A (2, A) 2 5 B (5, B) 3 3 C (3, C) 4 5 C (5, C) 5 3 B (3, B) 6 6 A (6, A) 7 6 A (6, A) 8 2 B (2, B) 9 2 C (2, C) 10 5 A (5, A) 11 5 B (5, B)
-
Create a
Series
for value counts. This will be used like a lookup table.value_counts = df["col3"].value_counts()
(6, A) 3 (5, B) 2 (2, A) 1 (3, C) 1 (5, C) 1 (3, B) 1 (2, B) 1 (2, C) 1 (5, A) 1 Name: col3, dtype: int64
-
Map each row to a fourth column called
counts
df["counts"] = df["col3"].map(value_counts)
col_1 col_2 col3 counts 0 6 A (6, A) 3 1 2 A (2, A) 1 2 5 B (5, B) 2 3 3 C (3, C) 1 4 5 C (5, C) 1 5 3 B (3, B) 1 6 6 A (6, A) 3 7 6 A (6, A) 3 8 2 B (2, B) 1 9 2 C (2, C) 1 10 5 A (5, A) 1 11 5 B (5, B) 2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论