为什么 pandas 的 `date_range` 会向上取整到下个月?

huangapple go评论69阅读模式
英文:

Why does pandas `date_range` rounds up to the next month?

问题

使用 pandas.date_range 与起始日期、频率和周期一起使用时,日期范围会在起始日期为月底的情况下四舍五入。

这似乎是一个潜在的边缘情况错误。如果这不是错误,是否有任何关于为什么会这样的想法?

例如

import pandas as pd

start_date = pd.Timestamp(2023, 5, 31)
date_range = pd.date_range(start=start_date, freq="MS", periods=6)

结果是

DatetimeIndex(['2023-06-01', '2023-07-01', '2023-08-01', '2023-09-01',
               '2023-10-01', '2023-11-01'],
              dtype='datetime64[ns]', freq='MS')

根据文档,我预期它应该从五月开始,十月结束:

DatetimeIndex(['2023-05-01', '2023-06-01', '2023-07-01', '2023-08-01', '2023-09-01',
               '2023-10-01'],
              dtype='datetime64[ns]', freq='MS')

我以为可能与 inclusive 参数有关,但这也不是原因。

英文:

When using pandas.date_range with start date, frequency, and periods the date range rounds up when using the start date as the last day of a month.

It seems like a silent edge case bug. If it's not a bug, any idea why it does that?

For example

import pandas as pd

start_date = pd.Timestamp(2023, 5, 31)
date_range = pd.date_range(start=start_date, freq="MS", periods=6)

results in

DatetimeIndex(['2023-06-01', '2023-07-01', '2023-08-01', '2023-09-01',
               '2023-10-01', '2023-11-01'],
              dtype='datetime64[ns]', freq='MS')

From the documentation, I'd expect it to start in May and end in October:

DatetimeIndex(['2023-05-01', '2023-06-01', '2023-07-01', '2023-08-01', '2023-09-01',
               '2023-10-01'],
              dtype='datetime64[ns]', freq='MS')

I thought it had to do with the inclusive argument but that's not the reason either.

答案1

得分: 0

pd.date_range用于生成在startend之间的日期范围。2023-05-01小于起始日期2023-05-31,它永远不会达到起始日期。要实现你想要的效果,你可以通过将pd.Timestamp的日替换为1来进行如下操作:

start_date = pd.Timestamp(2023, 5, 31)
date_range = pd.date_range(start=start_date.replace(day=1), freq="MS", periods=6)
print(date_range)

DatetimeIndex(['2023-05-01', '2023-06-01', '2023-07-01', '2023-08-01',
               '2023-09-01', '2023-10-01'],
              dtype='datetime64[ns]', freq='MS')
英文:

pd.date_range is to generate a range of date between start and end. 2023-05-01 is less than start date 2023-05-31, it will never reach it. To do what you want, you can replace the day of pd.Timestamp by 1.

start_date = pd.Timestamp(2023, 5, 31)
date_range = pd.date_range(start=start_date.replace(day=1), freq="MS", periods=6)
print(date_range)

DatetimeIndex(['2023-05-01', '2023-06-01', '2023-07-01', '2023-08-01',
               '2023-09-01', '2023-10-01'],
              dtype='datetime64[ns]', freq='MS')

答案2

得分: 0

以下是翻译好的部分:

"documentation reads

“such that they all satisfy start <= x <= end”

Therefore, as the date provided is pd.Timestamp(2023, 5, 31), the first "MS" (start-of-month) date that satisfies start <= x is the following month."

英文:

The documentation reads

> "such that they all satisfy start <[=] x <[=] end"

Therefore, as the date provided is pd.Timestamp(2023, 5, 31), the first &quot;MS&quot; (start-of-month) date that satisfies start &lt;= x is the following month.

huangapple
  • 本文由 发表于 2023年6月1日 22:55:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76383210.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定