英文:
How to delete rows in DataFrame based on time intervals in Pandas Python
问题
我有一个非常庞大的时间数值组成的DataFrame。我想删除间隔少于一分钟的行,我该怎么做?
注:这些是来自机器的60分钟读数,因此DataFrame比图片中的要大得多,27.2最终也会改变
我尝试使用newdf=df1[df1['Time'].dt.minute%1==0],希望它会删除每一行,除非间隔至少一分钟,但没有成功。
英文:
I have a very large DataFrame consisting of time values. I want to delete the rows that are less than a minute apart, how would I go about doing that?
I tried using newdf=df1[df1['Time'].dt.minute%1==0] hoping it would delete every row that wasn't at least a minute apart but it didn't work.
答案1
得分: 0
将两行相减并与1分钟的时间差进行比较。
import pandas as pd
df = pd.DataFrame({'time': ['2020-7-7 21:00:00',
'2020-7-7 21:00:30',
'2020-7-7 21:01:30']})
df['time'] = pd.to_datetime(df['time'])
df_new = df[~(df['time'] - df['time'].shift() < pd.Timedelta('1m'))]
time
0 2020-07-07 21:00:00
2 2020-07-07 21:01:30
英文:
Subtract the two rows from each other and compare to a 1 minute time delta.
df = pd.DataFrame({'time': ['2020-7-7 21:00:00',
'2020-7-7 21:00:30',
'2020-7-7 21:01:30']})
df['time'] = pd.to_datetime(df['time'])
df_new = df[ ~(df['time'] - df['time'].shift() < pd.Timedelta('1m')) ]
time
0 2020-07-07 21:00:00
2 2020-07-07 21:01:30
答案2
得分: 0
以下是您要翻译的内容:
来自@Stu Sztukowski的答案删除了相邻行,这些行相距不到1米,这可能是您想要的。但如果每个删除的行都在前一个行之后的短时间内(就像您的数据一样),那么会导致保留值之间的时间间隔非常长。因此,会丢失很多数据。下面是一个替代方法,保留与前一个保留的行相距超过1米的行。
import pandas as pd
df = pd.DataFrame({'time': ["2020-07-07 17:13:25", "2020-07-07 17:13:47", "2020-07-07 17:14:35", "2020-07-07 17:14:55"],
'val': [1, 2, 3, 4]
})
df['time'] = pd.to_datetime(df['time'])
start = df.loc[0, 'time'] - pd.Timedelta('1m')
def func(x):
global start
if (x - start) < pd.Timedelta('1m'):
return False
else:
start = x
return True
df2 = df[df['time'].map(func)]
print(df)
print(df2)
这段代码的输出如下:
time val
0 2020-07-07 17:13:25 1
1 2020-07-07 17:13:47 2
2 2020-07-07 17:14:35 3
3 2020-07-07 17:14:55 4
time val
0 2020-07-07 17:13:25 1
2 2020-07-07 17:14:35 3
希望这对您有帮助。
英文:
The answer from @Stu Sztukowski removes adjacent rows which are less than 1m apart, which may be what you want. But it could result in very long periods between retained values if each removed row was a small time after the previous (as in your data). So much data would be lost. Below is an alternative approach which retains rows which are more than 1m from the previous retained row.
import pandas as pd
df = pd.DataFrame({'time': ["2020-07-07 17:13:25", "2020-07-07 17:13:47", "2020-07-07 17:14:35", "2020-07-07 17:14:55"],
'val': [1, 2, 3, 4]
})
df['time']=pd.to_datetime(df['time'])
start = df.loc[0, 'time'] - pd.Timedelta('1m')
def func(x):
global start
if (x-start) < pd.Timedelta('1m'):
return False
else:
start = x
return True
df2 = df[df['time'].map(func)]
print(df)
print(df2)
which gives:
time val
0 2020-07-07 17:13:25 1
1 2020-07-07 17:13:47 2
2 2020-07-07 17:14:35 3
3 2020-07-07 17:14:55 4
time val
0 2020-07-07 17:13:25 1
2 2020-07-07 17:14:35 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论