英文:
Subtract consecutive rows based on binary condition
问题
我有一个如下的数据框:
data={'time':['2021-01-01 22:00:12','2021-01-05 22:49:12','2021-01-06 21:00:00','2021-01-06 23:59:15','2021-01-07 05:00:55','2021-01-07 12:00:39'],
'flag':['On','Off','On','Off','On','Off']}
df=pd.DataFrame(data)
我想要得到相邻行之间的差异,我使用了:
df['diff']=pd.to_datetime(df['time'])-pd.to_datetime(df['time'].shift(1))
但这里有一个计算开销,因为对于每个相邻的行来说,并没有差异的意义,我只想在标志变为“Off”时计算差异。另外,如何将差异转换为小时?
英文:
I have a dataframe like below:
data={'time':['2021-01-01 22:00:12','2021-01-05 22:49:12','2021-01-06 21:00:00','2021-01-06 23:59:15','2021-01-07 05:00:55','2021-01-07 12:00:39'],
'flag':['On','Off','On','Off','On','Off']}
df=pd.DataFrame(data)
I want to get difference between consecutive rows, which I accomplished using:
df['diff']=pd.to_datetime(df['time'])-pd.to_datetime(df['time'].shift(1))
But there is calculation overhead here as there is no meaning for difference for every consecutive rows, I only want the difference whenever the flag goes to Off. Also, how to convert the difference into hours ?
答案1
得分: 1
可以在遇到“On”标志时创建虚拟分组,然后计算差异。或者可以像您所做的那样计算整个数据框的差异,并隐藏标志为“On”的值:
# 将时间列转换为datetime64
df['time'] = pd.to_datetime(df['time'])
# 使用连续行创建虚拟分组
df['diff'] = df.groupby(df['flag'].eq('On').cumsum())['time'].diff()
# 或者
df['diff'] = df['time'].diff().mask(df['flag'] == 'On')
输出:
>>> df
time flag diff
0 2021-01-01 22:00:12 On NaT
1 2021-01-05 22:49:12 Off 4 days 00:49:00
2 2021-01-06 21:00:00 On NaT
3 2021-01-06 23:59:15 Off 0 days 02:59:15
4 2021-01-07 05:00:55 On NaT
5 2021-01-07 12:00:39 Off 0 days 06:59:44
英文:
You can create virtual groups whenever the "On" flag is encountered and then calculate the diff. Or you can calculate the diff for the whole dataframe like you did and hide the values where the flag is "On":
# convert time column as datetime64
df['time'] = pd.to_datetime(df['time'])
# create virtual groups with consecutive rows
df['diff'] = df.groupby(df['flag'].eq('On').cumsum())['time'].diff()
# OR
df['diff'] = df['time'].diff().mask(df['flag'] == 'On')
Output:
>>> df
time flag diff
0 2021-01-01 22:00:12 On NaT
1 2021-01-05 22:49:12 Off 4 days 00:49:00
2 2021-01-06 21:00:00 On NaT
3 2021-01-06 23:59:15 Off 0 days 02:59:15
4 2021-01-07 05:00:55 On NaT
5 2021-01-07 12:00:39 Off 0 days 06:59:44
答案2
得分: 0
遮盖差异当标志触发时
df['time'] = pd.to_datetime(df['time'])
mask = df['flag'].eq('Off') & df['flag'].shift().eq('On')
df['diff'] = df['time'].sub(df['time'].shift()).where(mask).dt.total_seconds() / 3600
time flag diff
0 2021-01-01 22:00:12 On NaN
1 2021-01-05 22:49:12 Off 96.816667
2 2021-01-06 21:00:00 On NaN
3 2021-01-06 23:59:15 Off 2.987500
4 2021-01-07 05:00:55 On NaN
5 2021-01-07 12:00:39 Off 6.995556
英文:
Mask the difference when the flag goes off
df['time'] = pd.to_datetime(df['time'])
mask = df['flag'].eq('Off') & df['flag'].shift().eq('On')
df['diff'] = df['time'].sub(df['time'].shift()).where(mask).dt.total_seconds() / 3600
time flag diff
0 2021-01-01 22:00:12 On NaN
1 2021-01-05 22:49:12 Off 96.816667
2 2021-01-06 21:00:00 On NaN
3 2021-01-06 23:59:15 Off 2.987500
4 2021-01-07 05:00:55 On NaN
5 2021-01-07 12:00:39 Off 6.995556
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论