Pandas数据框架 – 查找一个列中的重复项,但在另一个列中不同。

huangapple go评论69阅读模式
英文:

Pandas Dataframe - Finding Duplicates of One Column But Different in Another Column

问题

你可以使用Pandas库来实现这个目标。以下是一个示例代码,可以帮助你找到在列A中重复但列B中不同的值,并选择所有的索引:

import pandas as pd

# 创建示例DataFrame
data = {'idx': [0, 1, 2, 3, 4, 5, 6],
        'A': ['a1', 'a2', 'a2', 'a2', 'a3', 'a3', 'a4'],
        'B': ['b1', 'b1', 'b2', 'b1', 'b3', 'b3', 'b1']}
df = pd.DataFrame(data)

# 找到在列A中重复但列B中不同的值
result = df[df.duplicated(subset='A', keep=False) & df.duplicated(subset='B', keep=False)]

# 输出结果
print(result)

这段代码将打印出满足条件的DataFrame,其中列A中的值重复但列B中的值不同,包括相应的索引。

英文:

I have a Pandas dataframe, for example, like this:

idx A B
0 a1 b1
1 a2 b1
2 a2 b2
3 a2 b1
4 a3 b3
5 a3 b3
6 a4 b1

I want to find the duplicated values in Column A, but different values in Column B, and select all the indexes.

In above example, the results should be:

idx A B
1 a2 b1
2 a2 b2
3 a2 b1
  • Drop idx 0 and 6, the values in Column A are unique.
  • Drop idx 4 and 5, because the values in Column B are the same.
  • I want to keep both idx 1 and 3 in the results, although they are the same, but they have a different value in idx 2 (not all the same).

How can I achieve this goal?

答案1

得分: 3

你可以使用两个 groupby.transform 进行布尔索引

g = df.groupby('A')['B']

# A是否重复,且重复项是否非唯一?
out = df[g.transform('count').gt(1) & g.transform('nunique').gt(1)]

# 然而,非唯一条件暗示了A的重复
# 我们可以简化为:
out = df[df.groupby('A')['B'].transform('nunique').gt(1)]

或者,使用 isin

s = df.groupby('A')['B'].nunique()

out = df[df['A'].isin(s
展开收缩
.index)]

输出:

   idx   A   B
1    1  a2  b1
2    2  a2  b2
3    3  a2  b1
英文:

You can use two groupby.transform for boolean indexing:

g = df.groupby('A')['B']

# is A duplicated and are the duplicates non-unique?
out = df[g.transform('count').gt(1) & g.transform('nunique').gt(1)]

# the non-unique condition is however implying the duplication of A
# we can simplify to:
out = df[df.groupby('A')['B'].transform('nunique').gt(1)]

Or, with isin:

s = df.groupby('A')['B'].nunique()

out = df[df['A'].isin(s
展开收缩
.index)]

Output:

   idx   A   B
1    1  a2  b1
2    2  a2  b2
3    3  a2  b1

huangapple
  • 本文由 发表于 2023年5月17日 20:04:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76271900.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定