2023年3月7日 01:42:46go评论192阅读模式

英文:

OverflowError when subtracting datetime columns in pandas

问题

我正在尝试检查Pandas中两个时间戳列之间的差异是否大于n秒。我实际上不关心差异的具体值。我只想知道它是否大于n秒，而且我还可以将n限制在1到60之间的范围内。

听起来很简单，对吗？

这个问题有很多有价值的答案，介绍了如何做到这一点。

**问题：**由于我无法控制的原因，两个时间戳之间的差异可能非常大，这就是为什么我遇到整数溢出的问题。

这是一个MCVE：

import pandas as pd
import pandas.testing
dataframe = pd.DataFrame(
    {
        "historic": [pd.Timestamp("1900-01-01T00:00:00+00:00")],
        "futuristic": [pd.Timestamp("2200-01-01T00:00:00+00:00")],
    }
)
# 目标：判断futuristic和historic之间的差异是否大于n秒，即：
# futuristic - historic > n
number_of_seconds = 1
dataframe["diff_greater_n"] = (
    dataframe["futuristic"] - dataframe["historic"]
) / pd.Timedelta(seconds=1) > number_of_seconds
expected_dataframe = pd.DataFrame(
    {
        "historic": [pd.Timestamp("1900-01-01T00:00:00+00:00")],
        "futuristic": [pd.Timestamp("2200-01-01T00:00:00+00:00")],
        "diff_greater_n": [True],
    }
)
pandas.testing.assert_frame_equal(dataframe, expected_dataframe)

错误：

OverflowError：int64加法溢出

更多上下文：

时间戳需要具有秒的精度，即毫秒不重要
这是数据框上的多个或组合检查之一
数据框可能有数百万行
我很高兴终于能在stackoverflow上提出有关溢出错误的问题。

英文:

I'm trying to check if the difference between two Timestamp columns in Pandas is greater than n seconds. I don't actually care about the difference. I just want to know if it's greater than n seconds, and I could also limit n to a range between, let's say, 1 to 60.

Sounds easy, right?

This question has many valuable answers outlining how to do that.

The problem: For reasons outside of my control, the difference between the two timestamps may be quite large, and that's why I'm running into an integer overflow.

Here's a MCVE:

import pandas as pd
import pandas.testing
dataframe = pd.DataFrame(
    {
        &quot;historic&quot;: [pd.Timestamp(&quot;1900-01-01T00:00:00+00:00&quot;)],
        &quot;futuristic&quot;: [pd.Timestamp(&quot;2200-01-01T00:00:00+00:00&quot;)],
    }
)
# Goal: Figure out if the difference between
#       futuristic and historic is &gt; n seconds, i.e.:
#       futuristic - historic &gt; n
number_of_seconds = 1
dataframe[&quot;diff_greater_n&quot;] = (
    dataframe[&quot;futuristic&quot;] - dataframe[&quot;historic&quot;]
) / pd.Timedelta(seconds=1) &gt; number_of_seconds
expected_dataframe = pd.DataFrame(
    {
        &quot;historic&quot;: [pd.Timestamp(&quot;1900-01-01T00:00:00+00:00&quot;)],
        &quot;futuristic&quot;: [pd.Timestamp(&quot;2200-01-01T00:00:00+00:00&quot;)],
        &quot;diff_greater_n&quot;: [True],
    }
)
pandas.testing.assert_frame_equal(dataframe, expected_dataframe)

Error:

> OverflowError: Overflow in int64 addition

A bit more context:

The timestamps need to have second precision, i.e. I don't care about any milliseconds
This is one of multiple or-combined checks on the dataframe
The dataframe may have a few million rows
I'm quite happy that I get to finally ask about an Overflow error on stackoverflow

答案1

得分: 1

可能的一种选择是使用 `datetime`：

import datetime as dt

...

dataframe["diff_greater_n"] = (
dataframe["futuristic"].dt.to_pydatetime()
- dataframe["historic"].dt.to_pydatetime()
) / dt.timedelta(seconds=1) > number_of_seconds


<details>
<summary>英文:</summary>
One option may be to use `datetime`:

import datetime as dt

...

dataframe["diff_greater_n"] = (
dataframe["futuristic"].dt.to_pydatetime()
- dataframe["historic"].dt.to_pydatetime()
) / dt.timedelta(seconds=1) > number_of_seconds

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在 pandas 中减去日期列时出现 OverflowError

问题

答案1

Python index 和求值顺序

如何从使用Docker部署的AWS Lambda Python处理程序运行终端命令？

我们可以使用Solace Python API一次调用读取或消耗多条消息吗？

如何在静态类型语言中复制？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。