越界的时间戳在pandas中

huangapple go评论54阅读模式
英文:

Out of bound timestamps in pandas

问题

我需要将一些SQL代码重写成Python,并且我的问题是需要计算日期之间的差异:

如你所见,对于最终付款日期为'9999-12-31'的情况,可以轻松地进行日期相减。

但在pandas中,对于datetime64类型有限制,所以我会得到异常:

我看到的所有答案都是关于将这些日期转换为NaN(使用'coerce'关键字)。但我也需要计算这些日期的天数差。

提前感谢你的帮助。

英文:

I need to rewrite some sql code to python, and my problem is necessity of calculation differences in days:
越界的时间戳在pandas中
As you can see, for cases with final_pmt_date ‘9999-12-31’, the dates subtracted easily.

But in pandas there is limit for datetime64 type, so I get exception:
越界的时间戳在pandas中

All answers I saw were about converting this dates into NaN (with ‘coerce’ keyword). But I need to calculate number of days for such datetimes also.

Thank you in advance

答案1

得分: 1

A date like 9999-12-31 is out of range for pandas datetime.

Using vanilla Python datetime might be an alternative here, e.g. like

from datetime import datetime
import pandas as pd

df = pd.DataFrame(
    {
        "open": ["2021-12-27 00:00:00.000", "2019-03-06 00:00:00.000"],
        "close": ["9999-12-31 00:00:00.000", "2022-04-06 00:00:00.000"],
    }
)

df["delta"] = df.apply(
    (
        lambda row: datetime.fromisoformat(row["close"])
        - datetime.fromisoformat(row["open"]),
    ),
    axis=1,
)

df
                      open                    close                  delta
0  2021-12-27 00:00:00.000  9999-12-31 00:00:00.000  2913908 days, 0:00:00
1  2019-03-06 00:00:00.000  2022-04-06 00:00:00.000     1127 days 00:00:00

However note that you'll have to use an apply which is not very efficient compared to the "vectorized" pandas datetime methods. Maybe using NaT as an "invalid-value-identfier" is an option after all?

英文:

A date like 9999-12-31 is out of range for pandas datetime.

Using vanilla Python datetime might be an alternative here, e.g. like

from datetime import datetime
import pandas as pd

df = pd.DataFrame(
    {
        "open": ["2021-12-27 00:00:00.000", "2019-03-06 00:00:00.000"],
        "close": ["9999-12-31 00:00:00.000", "2022-04-06 00:00:00.000"],
    }
)

df["delta"] = df.apply(
    (
        lambda row: datetime.fromisoformat(row["close"])
        - datetime.fromisoformat(row["open"]),
    ),
    axis=1,
)

df
                      open                    close                  delta
0  2021-12-27 00:00:00.000  9999-12-31 00:00:00.000  2913908 days, 0:00:00
1  2019-03-06 00:00:00.000  2022-04-06 00:00:00.000     1127 days 00:00:00

However note that you'll have to use an apply which is not very efficient compared to the "vectorized" pandas datetime methods. Maybe using NaT as an "invalid-value-identfier" is an option after all?

huangapple
  • 本文由 发表于 2023年2月24日 02:11:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/75548734.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定