Python / 在满足条件的情况下从每个组中移除重复项

huangapple go评论104阅读模式
英文:

Python / remove duplicate from each group if condition meets in the group

问题

我想要删除每个组中仅在某一列的特定值中存在重复的行。

  1. df = pd.DataFrame({
  2. 'group': ['A', 'A', 'B', 'B', 'B', 'C', 'C'],
  3. 'channel': ['X', 'Y', 'Y', 'X', 'X', 'A', 'X'],
  4. 'value': [1, 2, 3, 3, 4, 5, 5]
  5. })

我想要保留每个组中仅保留X的第一个出现,并删除其他行,如下所示:

我尝试了以下代码,但这会删除所有具有重复通道值的行,而不管通道是否为X:

  1. df = df.groupby('group').apply(lambda x: x.drop_duplicates(subset='channel', keep='first') if 'X' in x['channel'].values else x)
英文:

I want to delete the rows in each group if there is the duplicate only in particular value of a column

  1. df = pd.DataFrame({
  2. 'group': ['A', 'A', 'B', 'B', 'B', 'C', 'C'],
  3. 'channel':['X','Y','Y','X','X','A','X'],
  4. 'value': [1, 2, 3, 3, 4, 5, 5]
  5. })

Python / 在满足条件的情况下从每个组中移除重复项

I want to keep the rows in each group only with first occurence of X and delete others as below

Python / 在满足条件的情况下从每个组中移除重复项

I tried below code but this delets all rows that have duplicate channel value irrespective of whther channel = X or not

  1. df = df.groupby('group').apply(lambda x: x.drop_duplicates(subset='channel', keep='first') if 'X' in x['channel'].values else x)

答案1

得分: 1

只返回翻译好的部分:

  1. 让我们创建一个用于筛选所需行的布尔掩码
  2. mask = df['channel'].eq('X') & df.duplicated(subset=['group', 'channel'])
  3. 结果
  4. df[~mask]
  5. group channel value
  6. 0 A X 1
  7. 1 A Y 2
  8. 2 B Y 3
  9. 3 B X 3
  10. 5 C A 5
  11. 6 C X 5
英文:

Lets create a boolean mask for filtering the required rows

  1. mask = df['channel'].eq('X') & df.duplicated(subset=['group', 'channel'])

Result

  1. df[~mask]
  2. group channel value
  3. 0 A X 1
  4. 1 A Y 2
  5. 2 B Y 3
  6. 3 B X 3
  7. 5 C A 5
  8. 6 C X 5

huangapple
  • 本文由 发表于 2023年7月24日 16:25:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76752630.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定