删除Pandas Python中基于时间间隔的DataFrame行。

huangapple go评论68阅读模式
英文:

How to delete rows in DataFrame based on time intervals in Pandas Python

问题

我有一个非常庞大的时间数值组成的DataFrame。我想删除间隔少于一分钟的行,我该怎么做?

注:这些是来自机器的60分钟读数,因此DataFrame比图片中的要大得多,27.2最终也会改变

我尝试使用newdf=df1[df1['Time'].dt.minute%1==0],希望它会删除每一行,除非间隔至少一分钟,但没有成功。

英文:

I have a very large DataFrame consisting of time values. I want to delete the rows that are less than a minute apart, how would I go about doing that?

Note: These are readings from a machine taken for 60 minutes thus the dataframe is much larger than the picture, the 27.2 eventually changes as well

I tried using newdf=df1[df1['Time'].dt.minute%1==0] hoping it would delete every row that wasn't at least a minute apart but it didn't work.

答案1

得分: 0

将两行相减并与1分钟的时间差进行比较。

import pandas as pd

df = pd.DataFrame({'time': ['2020-7-7 21:00:00', 
                            '2020-7-7 21:00:30', 
                            '2020-7-7 21:01:30']})

df['time'] = pd.to_datetime(df['time'])

df_new = df[~(df['time'] - df['time'].shift() < pd.Timedelta('1m'))]
                 time
0 2020-07-07 21:00:00
2 2020-07-07 21:01:30
英文:

Subtract the two rows from each other and compare to a 1 minute time delta.

df = pd.DataFrame({&#39;time&#39;: [&#39;2020-7-7 21:00:00&#39;, 
                            &#39;2020-7-7 21:00:30&#39;, 
                            &#39;2020-7-7 21:01:30&#39;]})

df[&#39;time&#39;] = pd.to_datetime(df[&#39;time&#39;])

df_new = df[ ~(df[&#39;time&#39;] - df[&#39;time&#39;].shift() &lt; pd.Timedelta(&#39;1m&#39;)) ]
                 time
0 2020-07-07 21:00:00
2 2020-07-07 21:01:30

答案2

得分: 0

以下是您要翻译的内容:

来自@Stu Sztukowski的答案删除了相邻行,这些行相距不到1米,这可能是您想要的。但如果每个删除的行都在前一个行之后的短时间内(就像您的数据一样),那么会导致保留值之间的时间间隔非常长。因此,会丢失很多数据。下面是一个替代方法,保留与前一个保留的行相距超过1米的行。

import pandas as pd

df = pd.DataFrame({'time': ["2020-07-07 17:13:25", "2020-07-07 17:13:47", "2020-07-07 17:14:35", "2020-07-07 17:14:55"],
                   'val': [1, 2, 3, 4]
                   })

df['time'] = pd.to_datetime(df['time'])

start = df.loc[0, 'time'] - pd.Timedelta('1m')

def func(x):
    global start
    if (x - start) < pd.Timedelta('1m'):
        return False
    else:
        start = x
        return True

df2 = df[df['time'].map(func)]

print(df)
print(df2)

这段代码的输出如下:

                     time  val
0 2020-07-07 17:13:25    1
1 2020-07-07 17:13:47    2
2 2020-07-07 17:14:35    3
3 2020-07-07 17:14:55    4
                     time  val
0 2020-07-07 17:13:25    1
2 2020-07-07 17:14:35    3

希望这对您有帮助。

英文:

The answer from @Stu Sztukowski removes adjacent rows which are less than 1m apart, which may be what you want. But it could result in very long periods between retained values if each removed row was a small time after the previous (as in your data). So much data would be lost. Below is an alternative approach which retains rows which are more than 1m from the previous retained row.

import pandas as pd

df = pd.DataFrame({&#39;time&#39;: [&quot;2020-07-07 17:13:25&quot;, &quot;2020-07-07 17:13:47&quot;, &quot;2020-07-07 17:14:35&quot;, &quot;2020-07-07 17:14:55&quot;],
                   &#39;val&#39;: [1, 2, 3, 4]
                   })

df[&#39;time&#39;]=pd.to_datetime(df[&#39;time&#39;])


start = df.loc[0, &#39;time&#39;] - pd.Timedelta(&#39;1m&#39;)
                       
def func(x):
    global start
    if (x-start) &lt; pd.Timedelta(&#39;1m&#39;):
        return False
    else:
        start = x
        return True
    
df2 = df[df[&#39;time&#39;].map(func)]

print(df)
print(df2)

which gives:

                 time  val
0 2020-07-07 17:13:25    1
1 2020-07-07 17:13:47    2
2 2020-07-07 17:14:35    3
3 2020-07-07 17:14:55    4
                 time  val
0 2020-07-07 17:13:25    1
2 2020-07-07 17:14:35    3

huangapple
  • 本文由 发表于 2023年6月8日 04:40:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/76426983.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定