2023年7月24日 16:25:34go评论104阅读模式

英文:

Python / remove duplicate from each group if condition meets in the group

问题

我想要删除每个组中仅在某一列的特定值中存在重复的行。

df = pd.DataFrame({
    'group': ['A', 'A', 'B', 'B', 'B', 'C', 'C'],
    'channel': ['X', 'Y', 'Y', 'X', 'X', 'A', 'X'],
    'value': [1, 2, 3, 3, 4, 5, 5]
})

我想要保留每个组中仅保留X的第一个出现，并删除其他行，如下所示：

我尝试了以下代码，但这会删除所有具有重复通道值的行，而不管通道是否为X：

df = df.groupby('group').apply(lambda x: x.drop_duplicates(subset='channel', keep='first') if 'X' in x['channel'].values else x)

英文:

I want to delete the rows in each group if there is the duplicate only in particular value of a column

df = pd.DataFrame({
    &#39;group&#39;: [&#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;B&#39;, &#39;B&#39;, &#39;C&#39;, &#39;C&#39;],
    &#39;channel&#39;:[&#39;X&#39;,&#39;Y&#39;,&#39;Y&#39;,&#39;X&#39;,&#39;X&#39;,&#39;A&#39;,&#39;X&#39;],
    &#39;value&#39;: [1, 2, 3, 3, 4, 5, 5]
})

I want to keep the rows in each group only with first occurence of X and delete others as below

I tried below code but this delets all rows that have duplicate channel value irrespective of whther channel = X or not

df = df.groupby(&#39;group&#39;).apply(lambda x: x.drop_duplicates(subset=&#39;channel&#39;, keep=&#39;first&#39;) if &#39;X&#39; in x[&#39;channel&#39;].values else x)

答案1

得分: 1

只返回翻译好的部分：

让我们创建一个用于筛选所需行的布尔掩码
mask = df['channel'].eq('X') & df.duplicated(subset=['group', 'channel'])
结果
df[~mask]
      group channel  value
    0     A       X      1
    1     A       Y      2
    2     B       Y      3
    3     B       X      3
    5     C       A      5
    6     C       X      5

英文:

Lets create a boolean mask for filtering the required rows

mask = df[&#39;channel&#39;].eq(&#39;X&#39;) &amp; df.duplicated(subset=[&#39;group&#39;, &#39;channel&#39;])

Result

df[~mask]
  group channel  value
0     A       X      1
1     A       Y      2
2     B       Y      3
3     B       X      3
5     C       A      5
6     C       X      5

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python / 在满足条件的情况下从每个组中移除重复项

问题

答案1

列出每个在OKTA中的应用程序中分配给的用户。

FieldError( django.core.exceptions.FieldError: 无法将关键词’is_active’解析为字段

检查列是否具有相同的字符串

如何迭代两个文件并仅提取匹配前的一行。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。