2023年6月8日 04:40:54go评论111阅读模式

英文:

How to delete rows in DataFrame based on time intervals in Pandas Python

问题

我有一个非常庞大的时间数值组成的DataFrame。我想删除间隔少于一分钟的行，我该怎么做？

注：这些是来自机器的60分钟读数，因此DataFrame比图片中的要大得多，27.2最终也会改变

我尝试使用newdf=df1[df1['Time'].dt.minute%1==0]，希望它会删除每一行，除非间隔至少一分钟，但没有成功。

英文:

I have a very large DataFrame consisting of time values. I want to delete the rows that are less than a minute apart, how would I go about doing that?

Note: These are readings from a machine taken for 60 minutes thus the dataframe is much larger than the picture, the 27.2 eventually changes as well

I tried using newdf=df1[df1['Time'].dt.minute%1==0] hoping it would delete every row that wasn't at least a minute apart but it didn't work.

答案1

得分: 0

将两行相减并与1分钟的时间差进行比较。

import pandas as pd
df = pd.DataFrame({'time': ['2020-7-7 21:00:00', 
                            '2020-7-7 21:00:30', 
                            '2020-7-7 21:01:30']})
df['time'] = pd.to_datetime(df['time'])
df_new = df[~(df['time'] - df['time'].shift() < pd.Timedelta('1m'))]

                 time
0 2020-07-07 21:00:00
2 2020-07-07 21:01:30

英文:

Subtract the two rows from each other and compare to a 1 minute time delta.

df = pd.DataFrame({&#39;time&#39;: [&#39;2020-7-7 21:00:00&#39;, 
                            &#39;2020-7-7 21:00:30&#39;, 
                            &#39;2020-7-7 21:01:30&#39;]})
df[&#39;time&#39;] = pd.to_datetime(df[&#39;time&#39;])
df_new = df[ ~(df[&#39;time&#39;] - df[&#39;time&#39;].shift() &lt; pd.Timedelta(&#39;1m&#39;)) ]

                 time
0 2020-07-07 21:00:00
2 2020-07-07 21:01:30

答案2

得分: 0

以下是您要翻译的内容：

来自@Stu Sztukowski的答案删除了相邻行，这些行相距不到1米，这可能是您想要的。但如果每个删除的行都在前一个行之后的短时间内（就像您的数据一样），那么会导致保留值之间的时间间隔非常长。因此，会丢失很多数据。下面是一个替代方法，保留与前一个保留的行相距超过1米的行。

import pandas as pd
df = pd.DataFrame({'time': ["2020-07-07 17:13:25", "2020-07-07 17:13:47", "2020-07-07 17:14:35", "2020-07-07 17:14:55"],
                   'val': [1, 2, 3, 4]
                   })
df['time'] = pd.to_datetime(df['time'])
start = df.loc[0, 'time'] - pd.Timedelta('1m')
def func(x):
    global start
    if (x - start) < pd.Timedelta('1m'):
        return False
    else:
        start = x
        return True
df2 = df[df['time'].map(func)]
print(df)
print(df2)

这段代码的输出如下：

                     time  val
0 2020-07-07 17:13:25    1
1 2020-07-07 17:13:47    2
2 2020-07-07 17:14:35    3
3 2020-07-07 17:14:55    4
                     time  val
0 2020-07-07 17:13:25    1
2 2020-07-07 17:14:35    3

希望这对您有帮助。

英文:

The answer from @Stu Sztukowski removes adjacent rows which are less than 1m apart, which may be what you want. But it could result in very long periods between retained values if each removed row was a small time after the previous (as in your data). So much data would be lost. Below is an alternative approach which retains rows which are more than 1m from the previous retained row.

import pandas as pd
df = pd.DataFrame({&#39;time&#39;: [&quot;2020-07-07 17:13:25&quot;, &quot;2020-07-07 17:13:47&quot;, &quot;2020-07-07 17:14:35&quot;, &quot;2020-07-07 17:14:55&quot;],
                   &#39;val&#39;: [1, 2, 3, 4]
                   })
df[&#39;time&#39;]=pd.to_datetime(df[&#39;time&#39;])
start = df.loc[0, &#39;time&#39;] - pd.Timedelta(&#39;1m&#39;)
                       
def func(x):
    global start
    if (x-start) &lt; pd.Timedelta(&#39;1m&#39;):
        return False
    else:
        start = x
        return True
    
df2 = df[df[&#39;time&#39;].map(func)]
print(df)
print(df2)

which gives:

                 time  val
0 2020-07-07 17:13:25    1
1 2020-07-07 17:13:47    2
2 2020-07-07 17:14:35    3
3 2020-07-07 17:14:55    4
                 time  val
0 2020-07-07 17:13:25    1
2 2020-07-07 17:14:35    3

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

删除Pandas Python中基于时间间隔的DataFrame行。

问题

答案1

答案2

将两个功能合并为一个

Moving from Django signals to save override: How to translate the "created" parameter of a Django post_save signal for a save method override

AttributeError: ‘Series’ 对象没有 ‘iterrows’ 属性 – Python

My pygame slider moves back to the default position when mouse click is released rather than staying where it was left

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。