计算Pandas数据帧中与第一个值相关的时间差。

huangapple go评论62阅读模式
英文:

Compute time differences in Pandas dataframe with respect to first value

问题

我有一个问题,看起来与[这个问题][1]有些相似,但我不知道如何修改那里提供的答案以适应我的问题。

我有一个数据框,看起来像这样:

    Date                   user
    2012-12-05 09:30:00    0
    2012-12-05 09:35:00    1
    2012-12-05 09:40:00    2
    2012-12-05 09:45:00    3
    2012-12-05 09:50:00    4
    2012-12-06 09:30:00    5
    2012-12-06 09:35:00    6
    2012-12-06 09:40:00    7
    2012-12-06 09:45:00    8

我想计算用户1、2、3...与用户0之间的相对时间差。这个值应该添加到第三列(最好是以秒为单位)。所以在这个示例中,结果应该是:

    Date                   user     diff
    2012-12-05 09:30:00    0        0
    2012-12-05 09:35:00    1        300
    2012-12-05 09:40:00    2        600
    2012-12-05 09:45:00    3        900
    2012-12-05 09:50:00    4        1200
    2012-12-06 09:30:00    5        1500
    2012-12-06 09:35:00    6        1800
    2012-12-06 09:40:00    7        2100
    2012-12-06 09:45:00    8        2400

我正在查看提供的答案,但我认为我不能在这里使用group_by。我有点困惑。
[1]: https://stackoverflow.com/questions/40104449/pandas-calculating-daily-differences-relative-to-earliest-value

英文:

I have a question that looks somewhat similar to [this one][1], however I don't know how to modify the answer given there to fit my problem.

I have a dataframe that looks like this:

Date                   user
2012-12-05 09:30:00    0
2012-12-05 09:35:00    1
2012-12-05 09:40:00    2
2012-12-05 09:45:00    3
2012-12-05 09:50:00    4
2012-12-06 09:30:00    5
2012-12-06 09:35:00    6
2012-12-06 09:40:00    7
2012-12-06 09:45:00    8

and I want to compute the relative time differences between users 1, 2, 3... and user 0. This value should be added in a third column (preferably in seconds). So in this example, the result should be:

Date                   user     diff
2012-12-05 09:30:00    0        0
2012-12-05 09:35:00    1        300
2012-12-05 09:40:00    2        600
2012-12-05 09:45:00    3        900
2012-12-05 09:50:00    4        1200
2012-12-06 09:30:00    5        1500
2012-12-06 09:35:00    6        1800
2012-12-06 09:40:00    7        2100
2012-12-06 09:45:00    8        2400

I am looking at the answer provided but I don't think I can use group_by here. I am a bit stuck.
[1]: https://stackoverflow.com/questions/40104449/pandas-calculating-daily-differences-relative-to-earliest-value

答案1

得分: 2

你可以subtract第一个值并获得total_seconds

df['Date'] = pd.to_datetime(df['Date'])

df['diff'] = df['Date'].sub(df['Date'].iloc[0]).dt.total_seconds()

输出结果:

                 Date  user     diff
0 2012-12-05 09:30:00     0      0.0
1 2012-12-05 09:35:00     1    300.0
2 2012-12-05 09:40:00     2    600.0
3 2012-12-05 09:45:00     3    900.0
4 2012-12-05 09:50:00     4   1200.0
5 2012-12-06 09:30:00     5  86400.0
6 2012-12-06 09:35:00     6  86700.0
7 2012-12-06 09:40:00     7  87000.0
8 2012-12-06 09:45:00     8  87300.0
英文:

You can subtract the first value and get the total_seconds:

df['Date'] = pd.to_datetime(df['Date'])

df['diff'] = df['Date'].sub(df['Date'].iloc[0]).dt.total_seconds()

Output:

                 Date  user     diff
0 2012-12-05 09:30:00     0      0.0
1 2012-12-05 09:35:00     1    300.0
2 2012-12-05 09:40:00     2    600.0
3 2012-12-05 09:45:00     3    900.0
4 2012-12-05 09:50:00     4   1200.0
5 2012-12-06 09:30:00     5  86400.0
6 2012-12-06 09:35:00     6  86700.0
7 2012-12-06 09:40:00     7  87000.0
8 2012-12-06 09:45:00     8  87300.0

答案2

得分: 2

你可以通过 Series.dt.total_seconds 方法减去最小值并将时间间隔转换为秒:

df.Date = pd.to_datetime(df.Date)

df['diff'] = df.Date.sub(df.Date.min()).dt.total_seconds()
print (df)
                 Date  user     diff
0 2012-12-05 09:30:00     0      0.0
1 2012-12-05 09:35:00     1    300.0
2 2012-12-05 09:40:00     2    600.0
3 2012-12-05 09:45:00     3    900.0
4 2012-12-05 09:50:00     4   1200.0
5 2012-12-06 09:30:00     5  86400.0
6 2012-12-06 09:35:00     6  86700.0
7 2012-12-06 09:40:00     7  87000.0
8 2012-12-06 09:45:00     8  87300.0
英文:

You can subtract minimal value and convert timedeltas to seconds by Series.dt.total_seconds:

df.Date = pd.to_datetime(df.Date)

df['diff'] = df.Date.sub(df.Date.min()).dt.total_seconds()
print (df)
                 Date  user     diff
0 2012-12-05 09:30:00     0      0.0
1 2012-12-05 09:35:00     1    300.0
2 2012-12-05 09:40:00     2    600.0
3 2012-12-05 09:45:00     3    900.0
4 2012-12-05 09:50:00     4   1200.0
5 2012-12-06 09:30:00     5  86400.0
6 2012-12-06 09:35:00     6  86700.0
7 2012-12-06 09:40:00     7  87000.0
8 2012-12-06 09:45:00     8  87300.0

huangapple
  • 本文由 发表于 2023年3月9日 19:16:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/75683849.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定