创建一个基于其他列计数的新列。

huangapple go评论81阅读模式
英文:

Create a new column based on count of other columns

问题

  1. 我有一个pandas数据框,看起来像这样

col_1 col_2
6 A
2 A
5 B
3 C
5 C
3 B
6 A
6 A
2 B
2 C
5 A
5 B

  1. 我想要添加一个新列```col_new```,该列统计与```col_1``````col_2```中相同元素的行数,但不包括该行本身。所以期望的输出应该是

col_1 col_2 col_new
6 A 2
2 A 0
5 B 1
3 C 0
5 C 0
3 B 0
6 A 2
6 A 2
2 B 0
2 C 0
5 A 0
5 B 1

  1. 这是我的尝试,但我不确定是否是正确的方法:
  2. ```df['col_new'] = df.groupby(['col_1', 'col_2']).col_1.transform('size') - 1```
英文:

I have a dataframe in pandas that looks like

  1. col_1 col_2
  2. 6 A
  3. 2 A
  4. 5 B
  5. 3 C
  6. 5 C
  7. 3 B
  8. 6 A
  9. 6 A
  10. 2 B
  11. 2 C
  12. 5 A
  13. 5 B

and i want to add a new column col_new that counts the number of rows with the same elements in col_1 and col_2 but excluding that row itself. So the desired output would look like

  1. col_1 col_2 col_new
  2. 6 A 2
  3. 2 A 0
  4. 5 B 1
  5. 3 C 0
  6. 5 C 0
  7. 3 B 0
  8. 6 A 2
  9. 6 A 2
  10. 2 B 0
  11. 2 C 0
  12. 5 A 0
  13. 5 B 1

Here what's I tried but I am not sure if it's the right approach:

df['col_new'] = df.groupby(['col_1', 'col_2']).count()

But then I got the error: TypeError: incompatible index of inserted column with frame index

Thanks in advance.

答案1

得分: 3

  1. 您可以使用:

df['col_new'] = df.groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)

  1. 输出结果:
  1. col_1 col_2 col_new

0 6 A 2
1 2 A 0
2 5 B 1
3 3 C 0
4 5 C 0
5 3 B 0
6 6 A 2
7 6 A 2
8 2 B 0
9 2 C 0
10 5 A 0
11 5 B 1

  1. <details>
  2. <summary>英文:</summary>
  3. You can use:

df['col_new'] = df.groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)

  1. Output:
  1. col_1 col_2 col_new

0 6 A 2
1 2 A 0
2 5 B 1
3 3 C 0
4 5 C 0
5 3 B 0
6 6 A 2
7 6 A 2
8 2 B 0
9 2 C 0
10 5 A 0
11 5 B 1

  1. </details>
  2. # 答案2
  3. **得分**: 2
  4. - 使用`value_counts`方法。
  5. ```python
  6. df["col3"] = df.apply(lambda x: (x[0], x[1]), axis=1)
  1. col_1 col_2 col3
  2. 0 6 A (6, A)
  3. 1 2 A (2, A)
  4. 2 5 B (5, B)
  5. 3 3 C (3, C)
  6. 4 5 C (5, C)
  7. 5 3 B (3, B)
  8. 6 6 A (6, A)
  9. 7 6 A (6, A)
  10. 8 2 B (2, B)
  11. 9 2 C (2, C)
  12. 10 5 A (5, A)
  13. 11 5 B (5, B)
  • 创建一个值计数的Series。这将被用作查找表。

    1. value_counts = df["col3"].value_counts()
    1. (6, A) 3
    2. (5, B) 2
    3. (2, A) 1
    4. (3, C) 1
    5. (5, C) 1
    6. (3, B) 1
    7. (2, B) 1
    8. (2, C) 1
    9. (5, A) 1
    10. Name: col3, dtype: int64
  • 将每行映射到第四列名为counts

    1. df["counts"] = df["col3"].map(value_counts)
    1. col_1 col_2 col3 counts
    2. 0 6 A (6, A) 3
    3. 1 2 A (2, A) 1
    4. 2 5 B (5, B) 2
    5. 3 3 C (3, C) 1
    6. 4 5 C (5, C) 1
    7. 5 3 B (3, B) 1
    8. 6 6 A (6, A) 3
    9. 7 6 A (6, A) 3
    10. 8 2 B (2, B) 1
    11. 9 2 C (2, C) 1
    12. 10 5 A (5, A) 1
    13. 11 5 B (5, B) 2
英文:

I would use the value_counts method.

  • Create a 3rd column called col3 and store a tuple of the row values. Tuples, unlike lists are hashable and can be used to create keys for counting.

    1. df[&quot;col3&quot;] = df.apply(lambda x: (x[0], x[1]), axis = 1)
    1. col_1 col_2 col3
    2. 0 6 A (6, A)
    3. 1 2 A (2, A)
    4. 2 5 B (5, B)
    5. 3 3 C (3, C)
    6. 4 5 C (5, C)
    7. 5 3 B (3, B)
    8. 6 6 A (6, A)
    9. 7 6 A (6, A)
    10. 8 2 B (2, B)
    11. 9 2 C (2, C)
    12. 10 5 A (5, A)
    13. 11 5 B (5, B)
  • Create a Series for value counts. This will be used like a lookup table.

    1. value_counts = df[&quot;col3&quot;].value_counts()
    1. (6, A) 3
    2. (5, B) 2
    3. (2, A) 1
    4. (3, C) 1
    5. (5, C) 1
    6. (3, B) 1
    7. (2, B) 1
    8. (2, C) 1
    9. (5, A) 1
    10. Name: col3, dtype: int64
  • Map each row to a fourth column called counts

    1. df[&quot;counts&quot;] = df[&quot;col3&quot;].map(value_counts)
    1. col_1 col_2 col3 counts
    2. 0 6 A (6, A) 3
    3. 1 2 A (2, A) 1
    4. 2 5 B (5, B) 2
    5. 3 3 C (3, C) 1
    6. 4 5 C (5, C) 1
    7. 5 3 B (3, B) 1
    8. 6 6 A (6, A) 3
    9. 7 6 A (6, A) 3
    10. 8 2 B (2, B) 1
    11. 9 2 C (2, C) 1
    12. 10 5 A (5, A) 1
    13. 11 5 B (5, B) 2

huangapple
  • 本文由 发表于 2023年2月14日 01:13:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/75439107.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定