英文:
Pandas Dataframe - Finding Duplicates of One Column But Different in Another Column
问题
你可以使用Pandas库来实现这个目标。以下是一个示例代码,可以帮助你找到在列A中重复但列B中不同的值,并选择所有的索引:
import pandas as pd
# 创建示例DataFrame
data = {'idx': [0, 1, 2, 3, 4, 5, 6],
'A': ['a1', 'a2', 'a2', 'a2', 'a3', 'a3', 'a4'],
'B': ['b1', 'b1', 'b2', 'b1', 'b3', 'b3', 'b1']}
df = pd.DataFrame(data)
# 找到在列A中重复但列B中不同的值
result = df[df.duplicated(subset='A', keep=False) & df.duplicated(subset='B', keep=False)]
# 输出结果
print(result)
这段代码将打印出满足条件的DataFrame,其中列A中的值重复但列B中的值不同,包括相应的索引。
英文:
I have a Pandas dataframe, for example, like this:
idx | A | B |
---|---|---|
0 | a1 | b1 |
1 | a2 | b1 |
2 | a2 | b2 |
3 | a2 | b1 |
4 | a3 | b3 |
5 | a3 | b3 |
6 | a4 | b1 |
I want to find the duplicated values in Column A, but different values in Column B, and select all the indexes.
In above example, the results should be:
idx | A | B |
---|---|---|
1 | a2 | b1 |
2 | a2 | b2 |
3 | a2 | b1 |
- Drop idx 0 and 6, the values in Column A are unique.
- Drop idx 4 and 5, because the values in Column B are the same.
- I want to keep both idx 1 and 3 in the results, although they are the same, but they have a different value in idx 2 (not all the same).
How can I achieve this goal?
答案1
得分: 3
你可以使用两个 groupby.transform
进行布尔索引:
g = df.groupby('A')['B']
# A是否重复,且重复项是否非唯一?
out = df[g.transform('count').gt(1) & g.transform('nunique').gt(1)]
# 然而,非唯一条件暗示了A的重复
# 我们可以简化为:
out = df[df.groupby('A')['B'].transform('nunique').gt(1)]
或者,使用 isin
:
s = df.groupby('A')['B'].nunique()
out = df[df['A'].isin(s展开收缩.index)]
输出:
idx A B
1 1 a2 b1
2 2 a2 b2
3 3 a2 b1
英文:
You can use two groupby.transform
for boolean indexing:
g = df.groupby('A')['B']
# is A duplicated and are the duplicates non-unique?
out = df[g.transform('count').gt(1) & g.transform('nunique').gt(1)]
# the non-unique condition is however implying the duplication of A
# we can simplify to:
out = df[df.groupby('A')['B'].transform('nunique').gt(1)]
Or, with isin
:
s = df.groupby('A')['B'].nunique()
out = df[df['A'].isin(s展开收缩.index)]
Output:
idx A B
1 1 a2 b1
2 2 a2 b2
3 3 a2 b1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论