2023年3月9日 19:16:06go评论65阅读模式

英文:

Compute time differences in Pandas dataframe with respect to first value

问题

我有一个问题，看起来与[这个问题][1]有些相似，但我不知道如何修改那里提供的答案以适应我的问题。

我有一个数据框，看起来像这样：

    Date                   user
    2012-12-05 09:30:00    0
    2012-12-05 09:35:00    1
    2012-12-05 09:40:00    2
    2012-12-05 09:45:00    3
    2012-12-05 09:50:00    4
    2012-12-06 09:30:00    5
    2012-12-06 09:35:00    6
    2012-12-06 09:40:00    7
    2012-12-06 09:45:00    8

我想计算用户1、2、3...与用户0之间的相对时间差。这个值应该添加到第三列（最好是以秒为单位）。所以在这个示例中，结果应该是：

    Date                   user     diff
    2012-12-05 09:30:00    0        0
    2012-12-05 09:35:00    1        300
    2012-12-05 09:40:00    2        600
    2012-12-05 09:45:00    3        900
    2012-12-05 09:50:00    4        1200
    2012-12-06 09:30:00    5        1500
    2012-12-06 09:35:00    6        1800
    2012-12-06 09:40:00    7        2100
    2012-12-06 09:45:00    8        2400

我正在查看提供的答案，但我认为我不能在这里使用group_by。我有点困惑。
[1]: https://stackoverflow.com/questions/40104449/pandas-calculating-daily-differences-relative-to-earliest-value

英文:

I have a question that looks somewhat similar to [this one][1], however I don't know how to modify the answer given there to fit my problem.

I have a dataframe that looks like this:

Date                   user
2012-12-05 09:30:00    0
2012-12-05 09:35:00    1
2012-12-05 09:40:00    2
2012-12-05 09:45:00    3
2012-12-05 09:50:00    4
2012-12-06 09:30:00    5
2012-12-06 09:35:00    6
2012-12-06 09:40:00    7
2012-12-06 09:45:00    8

and I want to compute the relative time differences between users 1, 2, 3... and user 0. This value should be added in a third column (preferably in seconds). So in this example, the result should be:

Date                   user     diff
2012-12-05 09:30:00    0        0
2012-12-05 09:35:00    1        300
2012-12-05 09:40:00    2        600
2012-12-05 09:45:00    3        900
2012-12-05 09:50:00    4        1200
2012-12-06 09:30:00    5        1500
2012-12-06 09:35:00    6        1800
2012-12-06 09:40:00    7        2100
2012-12-06 09:45:00    8        2400

I am looking at the answer provided but I don't think I can use group_by here. I am a bit stuck.
[1]: https://stackoverflow.com/questions/40104449/pandas-calculating-daily-differences-relative-to-earliest-value

答案1

得分: 2

你可以subtract第一个值并获得total_seconds：

df['Date'] = pd.to_datetime(df['Date'])

df['diff'] = df['Date'].sub(df['Date'].iloc[0]).dt.total_seconds()

输出结果：

                 Date  user     diff
0 2012-12-05 09:30:00     0      0.0
1 2012-12-05 09:35:00     1    300.0
2 2012-12-05 09:40:00     2    600.0
3 2012-12-05 09:45:00     3    900.0
4 2012-12-05 09:50:00     4   1200.0
5 2012-12-06 09:30:00     5  86400.0
6 2012-12-06 09:35:00     6  86700.0
7 2012-12-06 09:40:00     7  87000.0
8 2012-12-06 09:45:00     8  87300.0

英文:

You can subtract the first value and get the total_seconds:

df[&#39;Date&#39;] = pd.to_datetime(df[&#39;Date&#39;])

df[&#39;diff&#39;] = df[&#39;Date&#39;].sub(df[&#39;Date&#39;].iloc[0]).dt.total_seconds()

Output:

                 Date  user     diff
0 2012-12-05 09:30:00     0      0.0
1 2012-12-05 09:35:00     1    300.0
2 2012-12-05 09:40:00     2    600.0
3 2012-12-05 09:45:00     3    900.0
4 2012-12-05 09:50:00     4   1200.0
5 2012-12-06 09:30:00     5  86400.0
6 2012-12-06 09:35:00     6  86700.0
7 2012-12-06 09:40:00     7  87000.0
8 2012-12-06 09:45:00     8  87300.0

答案2

得分: 2

你可以通过 Series.dt.total_seconds 方法减去最小值并将时间间隔转换为秒：

df.Date = pd.to_datetime(df.Date)

df['diff'] = df.Date.sub(df.Date.min()).dt.total_seconds()
print (df)
                 Date  user     diff
0 2012-12-05 09:30:00     0      0.0
1 2012-12-05 09:35:00     1    300.0
2 2012-12-05 09:40:00     2    600.0
3 2012-12-05 09:45:00     3    900.0
4 2012-12-05 09:50:00     4   1200.0
5 2012-12-06 09:30:00     5  86400.0
6 2012-12-06 09:35:00     6  86700.0
7 2012-12-06 09:40:00     7  87000.0
8 2012-12-06 09:45:00     8  87300.0

英文:

You can subtract minimal value and convert timedeltas to seconds by Series.dt.total_seconds:

df.Date = pd.to_datetime(df.Date)

df[&#39;diff&#39;] = df.Date.sub(df.Date.min()).dt.total_seconds()
print (df)
                 Date  user     diff
0 2012-12-05 09:30:00     0      0.0
1 2012-12-05 09:35:00     1    300.0
2 2012-12-05 09:40:00     2    600.0
3 2012-12-05 09:45:00     3    900.0
4 2012-12-05 09:50:00     4   1200.0
5 2012-12-06 09:30:00     5  86400.0
6 2012-12-06 09:35:00     6  86700.0
7 2012-12-06 09:40:00     7  87000.0
8 2012-12-06 09:45:00     8  87300.0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

计算Pandas数据帧中与第一个值相关的时间差。

问题

答案1

答案2

如何将输出文件转换为数组

在 Azure 搜索索引上的过滤不起作用。

如何防止VSCode在语句间重新排列Python导入？

获取 Pandas DataFrame 中每小时值的滚动平均值，同时考虑到一天的循环性质。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论