英文:
Python / remove duplicate from each group if condition meets in the group
问题
我想要删除每个组中仅在某一列的特定值中存在重复的行。
df = pd.DataFrame({
'group': ['A', 'A', 'B', 'B', 'B', 'C', 'C'],
'channel': ['X', 'Y', 'Y', 'X', 'X', 'A', 'X'],
'value': [1, 2, 3, 3, 4, 5, 5]
})
我想要保留每个组中仅保留X的第一个出现,并删除其他行,如下所示:
我尝试了以下代码,但这会删除所有具有重复通道值的行,而不管通道是否为X:
df = df.groupby('group').apply(lambda x: x.drop_duplicates(subset='channel', keep='first') if 'X' in x['channel'].values else x)
英文:
I want to delete the rows in each group if there is the duplicate only in particular value of a column
df = pd.DataFrame({
'group': ['A', 'A', 'B', 'B', 'B', 'C', 'C'],
'channel':['X','Y','Y','X','X','A','X'],
'value': [1, 2, 3, 3, 4, 5, 5]
})
I want to keep the rows in each group only with first occurence of X and delete others as below
I tried below code but this delets all rows that have duplicate channel value irrespective of whther channel = X or not
df = df.groupby('group').apply(lambda x: x.drop_duplicates(subset='channel', keep='first') if 'X' in x['channel'].values else x)
答案1
得分: 1
只返回翻译好的部分:
让我们创建一个用于筛选所需行的布尔掩码
mask = df['channel'].eq('X') & df.duplicated(subset=['group', 'channel'])
结果
df[~mask]
group channel value
0 A X 1
1 A Y 2
2 B Y 3
3 B X 3
5 C A 5
6 C X 5
英文:
Lets create a boolean mask for filtering the required rows
mask = df['channel'].eq('X') & df.duplicated(subset=['group', 'channel'])
Result
df[~mask]
group channel value
0 A X 1
1 A Y 2
2 B Y 3
3 B X 3
5 C A 5
6 C X 5
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论