英文:
Out of bound timestamps in pandas
问题
我需要将一些SQL代码重写成Python,并且我的问题是需要计算日期之间的差异:
如你所见,对于最终付款日期为'9999-12-31'的情况,可以轻松地进行日期相减。
但在pandas中,对于datetime64类型有限制,所以我会得到异常:
我看到的所有答案都是关于将这些日期转换为NaN(使用'coerce'关键字)。但我也需要计算这些日期的天数差。
提前感谢你的帮助。
英文:
I need to rewrite some sql code to python, and my problem is necessity of calculation differences in days:
As you can see, for cases with final_pmt_date ‘9999-12-31’, the dates subtracted easily.
But in pandas there is limit for datetime64 type, so I get exception:
All answers I saw were about converting this dates into NaN (with ‘coerce’ keyword). But I need to calculate number of days for such datetimes also.
Thank you in advance
答案1
得分: 1
A date like 9999-12-31
is out of range for pandas datetime.
Using vanilla Python datetime might be an alternative here, e.g. like
from datetime import datetime
import pandas as pd
df = pd.DataFrame(
{
"open": ["2021-12-27 00:00:00.000", "2019-03-06 00:00:00.000"],
"close": ["9999-12-31 00:00:00.000", "2022-04-06 00:00:00.000"],
}
)
df["delta"] = df.apply(
(
lambda row: datetime.fromisoformat(row["close"])
- datetime.fromisoformat(row["open"]),
),
axis=1,
)
df
open close delta
0 2021-12-27 00:00:00.000 9999-12-31 00:00:00.000 2913908 days, 0:00:00
1 2019-03-06 00:00:00.000 2022-04-06 00:00:00.000 1127 days 00:00:00
However note that you'll have to use an apply
which is not very efficient compared to the "vectorized" pandas datetime methods. Maybe using NaT
as an "invalid-value-identfier" is an option after all?
英文:
A date like 9999-12-31
is out of range for pandas datetime.
Using vanilla Python datetime might be an alternative here, e.g. like
from datetime import datetime
import pandas as pd
df = pd.DataFrame(
{
"open": ["2021-12-27 00:00:00.000", "2019-03-06 00:00:00.000"],
"close": ["9999-12-31 00:00:00.000", "2022-04-06 00:00:00.000"],
}
)
df["delta"] = df.apply(
(
lambda row: datetime.fromisoformat(row["close"])
- datetime.fromisoformat(row["open"]),
),
axis=1,
)
df
open close delta
0 2021-12-27 00:00:00.000 9999-12-31 00:00:00.000 2913908 days, 0:00:00
1 2019-03-06 00:00:00.000 2022-04-06 00:00:00.000 1127 days 00:00:00
However note that you'll have to use an apply
which is not very efficient compared to the "vectorized" pandas datetime methods. Maybe using NaT
as an "invalid-value-identfier" is an option after all?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论