英文:
OverflowError when subtracting datetime columns in pandas
问题
我正在尝试检查Pandas中两个时间戳列之间的差异是否大于n
秒。我实际上不关心差异的具体值。我只想知道它是否大于n
秒,而且我还可以将n
限制在1到60之间的范围内。
听起来很简单,对吗?
这个问题有很多有价值的答案,介绍了如何做到这一点。
**问题:**由于我无法控制的原因,两个时间戳之间的差异可能非常大,这就是为什么我遇到整数溢出的问题。
这是一个MCVE:
import pandas as pd
import pandas.testing
dataframe = pd.DataFrame(
{
"historic": [pd.Timestamp("1900-01-01T00:00:00+00:00")],
"futuristic": [pd.Timestamp("2200-01-01T00:00:00+00:00")],
}
)
# 目标:判断futuristic和historic之间的差异是否大于n秒,即:
# futuristic - historic > n
number_of_seconds = 1
dataframe["diff_greater_n"] = (
dataframe["futuristic"] - dataframe["historic"]
) / pd.Timedelta(seconds=1) > number_of_seconds
expected_dataframe = pd.DataFrame(
{
"historic": [pd.Timestamp("1900-01-01T00:00:00+00:00")],
"futuristic": [pd.Timestamp("2200-01-01T00:00:00+00:00")],
"diff_greater_n": [True],
}
)
pandas.testing.assert_frame_equal(dataframe, expected_dataframe)
错误:
OverflowError:int64加法溢出
更多上下文:
- 时间戳需要具有秒的精度,即毫秒不重要
- 这是数据框上的多个或组合检查之一
- 数据框可能有数百万行
- 我很高兴终于能在stackoverflow上提出有关溢出错误的问题。
英文:
I'm trying to check if the difference between two Timestamp columns in Pandas is greater than n
seconds. I don't actually care about the difference. I just want to know if it's greater than n
seconds, and I could also limit n
to a range between, let's say, 1 to 60.
Sounds easy, right?
This question has many valuable answers outlining how to do that.
The problem: For reasons outside of my control, the difference between the two timestamps may be quite large, and that's why I'm running into an integer overflow.
Here's a MCVE:
import pandas as pd
import pandas.testing
dataframe = pd.DataFrame(
{
"historic": [pd.Timestamp("1900-01-01T00:00:00+00:00")],
"futuristic": [pd.Timestamp("2200-01-01T00:00:00+00:00")],
}
)
# Goal: Figure out if the difference between
# futuristic and historic is > n seconds, i.e.:
# futuristic - historic > n
number_of_seconds = 1
dataframe["diff_greater_n"] = (
dataframe["futuristic"] - dataframe["historic"]
) / pd.Timedelta(seconds=1) > number_of_seconds
expected_dataframe = pd.DataFrame(
{
"historic": [pd.Timestamp("1900-01-01T00:00:00+00:00")],
"futuristic": [pd.Timestamp("2200-01-01T00:00:00+00:00")],
"diff_greater_n": [True],
}
)
pandas.testing.assert_frame_equal(dataframe, expected_dataframe)
Error:
> OverflowError: Overflow in int64 addition
A bit more context:
- The timestamps need to have second precision, i.e. I don't care about any milliseconds
- This is one of multiple or-combined checks on the dataframe
- The dataframe may have a few million rows
- I'm quite happy that I get to finally ask about an Overflow error on stackoverflow
答案1
得分: 1
可能的一种选择是使用 `datetime`:
import datetime as dt
...
dataframe["diff_greater_n"] = (
dataframe["futuristic"].dt.to_pydatetime()
- dataframe["historic"].dt.to_pydatetime()
) / dt.timedelta(seconds=1) > number_of_seconds
<details>
<summary>英文:</summary>
One option may be to use `datetime`:
import datetime as dt
...
dataframe["diff_greater_n"] = (
dataframe["futuristic"].dt.to_pydatetime()
- dataframe["historic"].dt.to_pydatetime()
) / dt.timedelta(seconds=1) > number_of_seconds
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论