英文:
Compute time differences in Pandas dataframe with respect to first value
问题
我有一个问题,看起来与[这个问题][1]有些相似,但我不知道如何修改那里提供的答案以适应我的问题。
我有一个数据框,看起来像这样:
Date user
2012-12-05 09:30:00 0
2012-12-05 09:35:00 1
2012-12-05 09:40:00 2
2012-12-05 09:45:00 3
2012-12-05 09:50:00 4
2012-12-06 09:30:00 5
2012-12-06 09:35:00 6
2012-12-06 09:40:00 7
2012-12-06 09:45:00 8
我想计算用户1、2、3...与用户0之间的相对时间差。这个值应该添加到第三列(最好是以秒为单位)。所以在这个示例中,结果应该是:
Date user diff
2012-12-05 09:30:00 0 0
2012-12-05 09:35:00 1 300
2012-12-05 09:40:00 2 600
2012-12-05 09:45:00 3 900
2012-12-05 09:50:00 4 1200
2012-12-06 09:30:00 5 1500
2012-12-06 09:35:00 6 1800
2012-12-06 09:40:00 7 2100
2012-12-06 09:45:00 8 2400
我正在查看提供的答案,但我认为我不能在这里使用group_by
。我有点困惑。
[1]: https://stackoverflow.com/questions/40104449/pandas-calculating-daily-differences-relative-to-earliest-value
英文:
I have a question that looks somewhat similar to [this one][1], however I don't know how to modify the answer given there to fit my problem.
I have a dataframe that looks like this:
Date user
2012-12-05 09:30:00 0
2012-12-05 09:35:00 1
2012-12-05 09:40:00 2
2012-12-05 09:45:00 3
2012-12-05 09:50:00 4
2012-12-06 09:30:00 5
2012-12-06 09:35:00 6
2012-12-06 09:40:00 7
2012-12-06 09:45:00 8
and I want to compute the relative time differences between users 1, 2, 3... and user 0. This value should be added in a third column (preferably in seconds). So in this example, the result should be:
Date user diff
2012-12-05 09:30:00 0 0
2012-12-05 09:35:00 1 300
2012-12-05 09:40:00 2 600
2012-12-05 09:45:00 3 900
2012-12-05 09:50:00 4 1200
2012-12-06 09:30:00 5 1500
2012-12-06 09:35:00 6 1800
2012-12-06 09:40:00 7 2100
2012-12-06 09:45:00 8 2400
I am looking at the answer provided but I don't think I can use group_by
here. I am a bit stuck.
[1]: https://stackoverflow.com/questions/40104449/pandas-calculating-daily-differences-relative-to-earliest-value
答案1
得分: 2
你可以subtract
第一个值并获得total_seconds
:
df['Date'] = pd.to_datetime(df['Date'])
df['diff'] = df['Date'].sub(df['Date'].iloc[0]).dt.total_seconds()
输出结果:
Date user diff
0 2012-12-05 09:30:00 0 0.0
1 2012-12-05 09:35:00 1 300.0
2 2012-12-05 09:40:00 2 600.0
3 2012-12-05 09:45:00 3 900.0
4 2012-12-05 09:50:00 4 1200.0
5 2012-12-06 09:30:00 5 86400.0
6 2012-12-06 09:35:00 6 86700.0
7 2012-12-06 09:40:00 7 87000.0
8 2012-12-06 09:45:00 8 87300.0
英文:
You can subtract
the first value and get the total_seconds
:
df['Date'] = pd.to_datetime(df['Date'])
df['diff'] = df['Date'].sub(df['Date'].iloc[0]).dt.total_seconds()
Output:
Date user diff
0 2012-12-05 09:30:00 0 0.0
1 2012-12-05 09:35:00 1 300.0
2 2012-12-05 09:40:00 2 600.0
3 2012-12-05 09:45:00 3 900.0
4 2012-12-05 09:50:00 4 1200.0
5 2012-12-06 09:30:00 5 86400.0
6 2012-12-06 09:35:00 6 86700.0
7 2012-12-06 09:40:00 7 87000.0
8 2012-12-06 09:45:00 8 87300.0
答案2
得分: 2
你可以通过 Series.dt.total_seconds
方法减去最小值并将时间间隔转换为秒:
df.Date = pd.to_datetime(df.Date)
df['diff'] = df.Date.sub(df.Date.min()).dt.total_seconds()
print (df)
Date user diff
0 2012-12-05 09:30:00 0 0.0
1 2012-12-05 09:35:00 1 300.0
2 2012-12-05 09:40:00 2 600.0
3 2012-12-05 09:45:00 3 900.0
4 2012-12-05 09:50:00 4 1200.0
5 2012-12-06 09:30:00 5 86400.0
6 2012-12-06 09:35:00 6 86700.0
7 2012-12-06 09:40:00 7 87000.0
8 2012-12-06 09:45:00 8 87300.0
英文:
You can subtract minimal value and convert timedeltas to seconds by Series.dt.total_seconds
:
df.Date = pd.to_datetime(df.Date)
df['diff'] = df.Date.sub(df.Date.min()).dt.total_seconds()
print (df)
Date user diff
0 2012-12-05 09:30:00 0 0.0
1 2012-12-05 09:35:00 1 300.0
2 2012-12-05 09:40:00 2 600.0
3 2012-12-05 09:45:00 3 900.0
4 2012-12-05 09:50:00 4 1200.0
5 2012-12-06 09:30:00 5 86400.0
6 2012-12-06 09:35:00 6 86700.0
7 2012-12-06 09:40:00 7 87000.0
8 2012-12-06 09:45:00 8 87300.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论