创建一个基于其他列计数的新列。

huangapple go评论62阅读模式
英文:

Create a new column based on count of other columns

问题

我有一个pandas数据框,看起来像这样

col_1 col_2
6 A
2 A
5 B
3 C
5 C
3 B
6 A
6 A
2 B
2 C
5 A
5 B

我想要添加一个新列```col_new```,该列统计与```col_1```和```col_2```中相同元素的行数,但不包括该行本身。所以期望的输出应该是

col_1 col_2 col_new
6 A 2
2 A 0
5 B 1
3 C 0
5 C 0
3 B 0
6 A 2
6 A 2
2 B 0
2 C 0
5 A 0
5 B 1

这是我的尝试,但我不确定是否是正确的方法:

```df['col_new'] = df.groupby(['col_1', 'col_2']).col_1.transform('size') - 1```
英文:

I have a dataframe in pandas that looks like

col_1   col_2
6       A       
2       A       
5       B       
3       C       
5       C       
3       B       
6       A       
6       A       
2       B       
2       C       
5       A       
5       B

and i want to add a new column col_new that counts the number of rows with the same elements in col_1 and col_2 but excluding that row itself. So the desired output would look like

col_1   col_2   col_new
6       A       2
2       A       0
5       B       1
3       C       0  
5       C       0
3       B       0
6       A       2
6       A       2
2       B       0
2       C       0
5       A       0
5       B       1   

Here what's I tried but I am not sure if it's the right approach:

df['col_new'] = df.groupby(['col_1', 'col_2']).count()

But then I got the error: TypeError: incompatible index of inserted column with frame index

Thanks in advance.

答案1

得分: 3

您可以使用:

df['col_new'] = df.groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)

输出结果:
col_1 col_2  col_new

0 6 A 2
1 2 A 0
2 5 B 1
3 3 C 0
4 5 C 0
5 3 B 0
6 6 A 2
7 6 A 2
8 2 B 0
9 2 C 0
10 5 A 0
11 5 B 1


<details>
<summary>英文:</summary>

You can use:

df['col_new'] = df.groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)

Output:
col_1 col_2  col_new

0 6 A 2
1 2 A 0
2 5 B 1
3 3 C 0
4 5 C 0
5 3 B 0
6 6 A 2
7 6 A 2
8 2 B 0
9 2 C 0
10 5 A 0
11 5 B 1


</details>



# 答案2
**得分**: 2


- 使用`value_counts`方法。

  ```python
  df["col3"] = df.apply(lambda x: (x[0], x[1]), axis=1)
       col_1 col_2    col3                                                                                              
0       6     A    (6, A)                                                                                              
1       2     A    (2, A)                                                                                              
2       5     B    (5, B)                                                                                              
3       3     C    (3, C)                                                                                              
4       5     C    (5, C)                                                                                              
5       3     B    (3, B)                                                                                              
6       6     A    (6, A)                                                                                              
7       6     A    (6, A)                                                                                              
8       2     B    (2, B)                                                                                              
9       2     C    (2, C)                                                                                              
10      5     A    (5, A)                                                                                              
11      5     B    (5, B)
  • 创建一个值计数的Series。这将被用作查找表。

    value_counts = df["col3"].value_counts()
    
    (6, A)    3
    (5, B)    2
    (2, A)    1
    (3, C)    1
    (5, C)    1
    (3, B)    1
    (2, B)    1
    (2, C)    1
    (5, A)    1
    Name: col3, dtype: int64
    
  • 将每行映射到第四列名为counts

    df["counts"] = df["col3"].map(value_counts)
    
           col_1 col_2    col3  counts
    0       6     A  (6, A)       3
    1       2     A  (2, A)       1
    2       5     B  (5, B)       2
    3       3     C  (3, C)       1
    4       5     C  (5, C)       1
    5       3     B  (3, B)       1
    6       6     A  (6, A)       3
    7       6     A  (6, A)       3
    8       2     B  (2, B)       1
    9       2     C  (2, C)       1
    10      5     A  (5, A)       1
    11      5     B  (5, B)       2
    
英文:

I would use the value_counts method.

  • Create a 3rd column called col3 and store a tuple of the row values. Tuples, unlike lists are hashable and can be used to create keys for counting.

    df[&quot;col3&quot;] = df.apply(lambda x: (x[0], x[1]), axis = 1)   
    
           col_1 col_2    col3                                                                                              
      0       6     A    (6, A)                                                                                              
      1       2     A    (2, A)                                                                                              
      2       5     B    (5, B)                                                                                             
      3       3     C    (3, C)                                                                                              
      4       5     C    (5, C)                                                                                              
      5       3     B    (3, B)                                                                                              
      6       6     A    (6, A)                                                                                              
      7       6     A    (6, A)                                                                                              
      8       2     B    (2, B)                                                                                              
      9       2     C    (2, C)                                                                                              
      10      5     A    (5, A)                                                                                              
      11      5     B    (5, B) 
    
  • Create a Series for value counts. This will be used like a lookup table.

    value_counts = df[&quot;col3&quot;].value_counts() 
    
    (6, A)    3
    (5, B)    2
    (2, A)    1
    (3, C)    1
    (5, C)    1
    (3, B)    1
    (2, B)    1
    (2, C)    1
    (5, A)    1
    Name: col3, dtype: int64
    
  • Map each row to a fourth column called counts

    df[&quot;counts&quot;] = df[&quot;col3&quot;].map(value_counts)  
    
           col_1 col_2    col3  counts
      0       6     A  (6, A)       3
      1       2     A  (2, A)       1
      2       5     B  (5, B)       2
      3       3     C  (3, C)       1
      4       5     C  (5, C)       1
      5       3     B  (3, B)       1
      6       6     A  (6, A)       3
      7       6     A  (6, A)       3
      8       2     B  (2, B)       1
      9       2     C  (2, C)       1
      10      5     A  (5, A)       1
      11      5     B  (5, B)       2
    

huangapple
  • 本文由 发表于 2023年2月14日 01:13:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/75439107.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定