2023年4月19日 17:30:05go评论74阅读模式

英文:

Remove duplicates in a Pandas data frame based on a column

问题

以下是翻译好的内容：

我有以下数据集，我想根据布尔列删除重复项。

日期时间	数值	布尔
2023-02-14 10:15:00	195.35	FALSE
2023-02-14 11:15:00	195.8	FALSE
2023-02-14 12:15:00	195.87	FALSE
2023-02-14 13:15:00	196.06	FALSE
2023-02-14 14:15:00	195.97	TRUE
2023-02-14 15:15:00	195.98	TRUE
2023-02-15 09:15:00	196.23	FALSE
2023-02-15 10:15:00	196.3	FALSE
2023-02-15 11:15:00	196.26	TRUE
2023-02-15 12:15:00	196.4	TRUE
2023-02-15 13:15:00	196.28	TRUE
2023-02-15 14:15:00	197.14	FALSE
2023-02-15 15:15:00	197.08	FALSE
2023-02-16 09:15:00	197.85	TRUE
2023-02-16 10:15:00	198.01	TRUE

结果数据应该如下所示：

日期时间	数值	布尔
2023-02-14 10:15:00	195.35	FALSE
2023-02-14 14:15:00	195.97	TRUE
2023-02-15 09:15:00	196.23	FALSE
2023-02-15 11:15:00	196.26	TRUE
2023-02-15 14:15:00	197.14	FALSE
2023-02-16 09:15:00	197.85	TRUE

我尝试过使用Pandas的drop_duplicates，但这会将整个布尔列分组，然后删除重复项，这将导致只剩下2行。

PS：我可能只需循环遍历所有行并与前一行进行比较，但我正在寻找一种Pandas的原生方法来执行此操作，如果存在的话。

英文:

I have below data-set and I want to remove duplicates based on the bool column.

datetime	number	bool
2023-02-14 10:15:00	195.35	FALSE
2023-02-14 11:15:00	195.8	FALSE
2023-02-14 12:15:00	195.87	FALSE
2023-02-14 13:15:00	196.06	FALSE
2023-02-14 14:15:00	195.97	TRUE
2023-02-14 15:15:00	195.98	TRUE
2023-02-15 09:15:00	196.23	FALSE
2023-02-15 10:15:00	196.3	FALSE
2023-02-15 11:15:00	196.26	TRUE
2023-02-15 12:15:00	196.4	TRUE
2023-02-15 13:15:00	196.28	TRUE
2023-02-15 14:15:00	197.14	FALSE
2023-02-15 15:15:00	197.08	FALSE
2023-02-16 09:15:00	197.85	TRUE
2023-02-16 10:15:00	198.01	TRUE

Resulting data should look like this

datetime	number	bool
2023-02-14 10:15:00	195.35	FALSE
2023-02-14 14:15:00	195.97	TRUE
2023-02-15 09:15:00	196.23	FALSE
2023-02-15 11:15:00	196.26	TRUE
2023-02-15 14:15:00	197.14	FALSE
2023-02-16 09:15:00	197.85	TRUE

I tried pandas drop_duplicates but this will group the whole bool column and then removes duplicates, that will result in only 2 rows.

PS: I might just loop through all rows and compare to the previous but I am looking for some Panda's native way of doing this, if it exists.

答案1

得分: 1

你可以使用 boolean indexing 通过 Series.ne 和 Series.shift 来比较偏移的值：

out = df[df['bool'].ne(df['bool'].shift())]
print (out)
               datetime  number   bool
0   2023-02-14 10:15:00  195.35  False
4   2023-02-14 14:15:00  195.97   True
6   2023-02-15 09:15:00  196.23  False
8   2023-02-15 11:15:00  196.26   True
11  2023-02-15 14:15:00  197.14  False
13  2023-02-16 09:15:00  197.85   True

英文:

You can use boolean indexing with compare shifted values by Series.ne and Series.shift:

out = df[df[&#39;bool&#39;].ne(df[&#39;bool&#39;].shift())]
print (out)
               datetime  number   bool
0   2023-02-14 10:15:00  195.35  False
4   2023-02-14 14:15:00  195.97   True
6   2023-02-15 09:15:00  196.23  False
8   2023-02-15 11:15:00  196.26   True
11  2023-02-15 14:15:00  197.14  False
13  2023-02-16 09:15:00  197.85   True

答案2

得分: 1

Sure, here's the translated code part:

如果您使用[duplicated][1]，则会发生什么

    without_duplicates = df.duplicated(['datetime', 'number'], keep='last') & df['bool']
    print(df[~without_duplicates])

我尝试的示例：

[![在此输入图像描述][2]][2]

[1]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.duplicated.html
[2]: https://i.stack.imgur.com/qgIPy.png

英文:

What if you use duplicated

without_duplicates = df.duplicated([&#39;datetime&#39;, &#39;number&#39;], keep=&#39;last&#39;) &amp; df[&#39;bool&#39;]
print(df[~without_duplicates])

Sample that I tried:

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据列删除Pandas数据帧中的重复项。

问题

答案1

答案2

如何在python-flask框架中运行永久后台进程？

如何在列表中创建带有 f-string 的新行？

Pandas – 在组内使用来自组的值进行缩放

如何以00:00的格式显示时间？目前，它以十进制格式显示时间。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论