Subtract consecutive rows based on binary condition

huangapple go评论112阅读模式
英文:

Subtract consecutive rows based on binary condition

问题

我有一个如下的数据框

data={'time':['2021-01-01 22:00:12','2021-01-05 22:49:12','2021-01-06 21:00:00','2021-01-06 23:59:15','2021-01-07 05:00:55','2021-01-07 12:00:39'],
    'flag':['On','Off','On','Off','On','Off']}
df=pd.DataFrame(data)

我想要得到相邻行之间的差异我使用了

df['diff']=pd.to_datetime(df['time'])-pd.to_datetime(df['time'].shift(1))

但这里有一个计算开销因为对于每个相邻的行来说并没有差异的意义我只想在标志变为Off时计算差异另外如何将差异转换为小时
英文:

I have a dataframe like below:

data={'time':['2021-01-01 22:00:12','2021-01-05 22:49:12','2021-01-06 21:00:00','2021-01-06 23:59:15','2021-01-07 05:00:55','2021-01-07 12:00:39'],
    'flag':['On','Off','On','Off','On','Off']}
df=pd.DataFrame(data)

I want to get difference between consecutive rows, which I accomplished using:

df['diff']=pd.to_datetime(df['time'])-pd.to_datetime(df['time'].shift(1))

But there is calculation overhead here as there is no meaning for difference for every consecutive rows, I only want the difference whenever the flag goes to Off. Also, how to convert the difference into hours ?

Subtract consecutive rows based on binary condition

答案1

得分: 1

可以在遇到“On”标志时创建虚拟分组,然后计算差异。或者可以像您所做的那样计算整个数据框的差异,并隐藏标志为“On”的值:

# 将时间列转换为datetime64
df['time'] = pd.to_datetime(df['time'])

# 使用连续行创建虚拟分组
df['diff'] = df.groupby(df['flag'].eq('On').cumsum())['time'].diff()
# 或者
df['diff'] = df['time'].diff().mask(df['flag'] == 'On')

输出:

>>> df
                 time flag            diff
0 2021-01-01 22:00:12   On             NaT
1 2021-01-05 22:49:12  Off 4 days 00:49:00
2 2021-01-06 21:00:00   On             NaT
3 2021-01-06 23:59:15  Off 0 days 02:59:15
4 2021-01-07 05:00:55   On             NaT
5 2021-01-07 12:00:39  Off 0 days 06:59:44
英文:

You can create virtual groups whenever the "On" flag is encountered and then calculate the diff. Or you can calculate the diff for the whole dataframe like you did and hide the values where the flag is "On":

# convert time column as datetime64
df['time'] = pd.to_datetime(df['time'])

# create virtual groups with consecutive rows
df['diff'] = df.groupby(df['flag'].eq('On').cumsum())['time'].diff()
# OR
df['diff'] = df['time'].diff().mask(df['flag'] == 'On')

Output:

>>> df
                 time flag            diff
0 2021-01-01 22:00:12   On             NaT
1 2021-01-05 22:49:12  Off 4 days 00:49:00
2 2021-01-06 21:00:00   On             NaT
3 2021-01-06 23:59:15  Off 0 days 02:59:15
4 2021-01-07 05:00:55   On             NaT
5 2021-01-07 12:00:39  Off 0 days 06:59:44

答案2

得分: 0

遮盖差异当标志触发时

df['time'] = pd.to_datetime(df['time'])

mask = df['flag'].eq('Off') & df['flag'].shift().eq('On')
df['diff'] = df['time'].sub(df['time'].shift()).where(mask).dt.total_seconds() / 3600

                 time flag       diff
0 2021-01-01 22:00:12   On        NaN
1 2021-01-05 22:49:12  Off  96.816667
2 2021-01-06 21:00:00   On        NaN
3 2021-01-06 23:59:15  Off   2.987500
4 2021-01-07 05:00:55   On        NaN
5 2021-01-07 12:00:39  Off   6.995556
英文:

Mask the difference when the flag goes off

df['time'] = pd.to_datetime(df['time'])

mask = df['flag'].eq('Off') & df['flag'].shift().eq('On')
df['diff'] = df['time'].sub(df['time'].shift()).where(mask).dt.total_seconds() / 3600

                 time flag       diff
0 2021-01-01 22:00:12   On        NaN
1 2021-01-05 22:49:12  Off  96.816667
2 2021-01-06 21:00:00   On        NaN
3 2021-01-06 23:59:15  Off   2.987500
4 2021-01-07 05:00:55   On        NaN
5 2021-01-07 12:00:39  Off   6.995556

huangapple
  • 本文由 发表于 2023年6月22日 20:29:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76531920.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定