通过迭代列来计算工作日。

huangapple go评论76阅读模式
英文:

Calculate business days by iterating over columns

问题

我有一个类似这样的数据框。

我想要计算事件日期与列中显示的月/日之间的工作日数量。输出将如下所示,将NaT转换为NaN。

有人知道如何最好地使用pandas来实现这个目标吗?

英文:

I have a dataframe that looks like this.

通过迭代列来计算工作日。

I would like to calculate the number of business days between the event date and the month/days show in the columns. The output would look like the below with the NaTs converted to nans:

通过迭代列来计算工作日。

Does anyone know how might be best to go about achieving this using pandas?

答案1

得分: 1

使用 numpy.busday_count 和广播/掩码,假设日期已经是 datetime 类型*:

import numpy as np

tmp = df.filter(regex=r'\d{4}-\d{2}')
df[tmp.columns] = np.where(tmp.notna(),
                           np.busday_count(df['Event_Date'].to_numpy(dtype='datetime64[D]')[:,None],
                                           tmp.fillna('0').to_numpy(dtype='datetime64[D]')),
                           np.nan)

* 否则使用 cols = df.drop(columns='Account').columns ; df[cols] = df[cols].apply(pd.to_datetime, dayfirst=True)

输出结果:

  Account Event_Date  2023-01  2023-02  2023-03  2023-04  2023-05  2023-06  2023-07
0       A 2023-04-25    -77.0      NaN      NaN      NaN      NaN      NaN      NaN
1       B 2023-06-02      NaN      NaN      NaN     21.0     51.0      NaN      NaN
2       C 2023-04-25      NaN      NaN      NaN      NaN      7.0      NaN      NaN

使用的输入数据:

from pandas import Timestamp

df = pd.DataFrame({'Account': ['A', 'B', 'C'],
                   'Event_Date': [Timestamp('2023-04-25 00:00:00'), Timestamp('2023-06-02 00:00:00'), Timestamp('2023-04-25 00:00:00')],
                   '2023-01': [Timestamp('2023-01-06 00:00:00'), NaT, NaT],
                   '2023-02': [NaT, NaT, NaT],
                   '2023-03': [NaT, NaT, NaT],
                   '2023-04': [NaT, Timestamp('2023-07-01 00:00:00'), NaT],
                   '2023-05': [NaT, Timestamp('2023-08-12 00:00:00'), Timestamp('2023-05-04 00:00:00')],
                   '2023-06': [NaT, NaT, NaT],
                   '2023-07': [NaT, NaT, NaT]})
英文:

Assuming the dates are already of datetime type*, use numpy.busday_count and broadcasting/masking:

import numpy as np

tmp = df.filter(regex=r'\d{4}-\d{2}')
df[tmp.columns] = np.where(tmp.notna(),
                           np.busday_count(df['Event_Date'].to_numpy(dtype='datetime64[D]')[:,None],
                                           tmp.fillna('0').to_numpy(dtype='datetime64[D]')),
                           np.nan)

* else convert with cols = df.drop(columns='Account').columns ; df[cols] = df[cols].apply(pd.to_datetime, dayfirst=True).

Output:

  Account Event_Date  2023-01  2023-02  2023-03  2023-04  2023-05  2023-06  2023-07
0       A 2023-04-25    -77.0      NaN      NaN      NaN      NaN      NaN      NaN
1       B 2023-06-02      NaN      NaN      NaN     21.0     51.0      NaN      NaN
2       C 2023-04-25      NaN      NaN      NaN      NaN      7.0      NaN      NaN

Used input:

from pandas import Timestamp

df = pd.DataFrame({'Account': ['A', 'B', 'C'],
                   'Event_Date': [Timestamp('2023-04-25 00:00:00'), Timestamp('2023-06-02 00:00:00'), Timestamp('2023-04-25 00:00:00')],
                   '2023-01': [Timestamp('2023-01-06 00:00:00'), NaT, NaT],
                   '2023-02': [NaT, NaT, NaT],
                   '2023-03': [NaT, NaT, NaT],
                   '2023-04': [NaT, Timestamp('2023-07-01 00:00:00'), NaT],
                   '2023-05': [NaT, Timestamp('2023-08-12 00:00:00'), Timestamp('2023-05-04 00:00:00')],
                   '2023-06': [NaT, NaT, NaT],
                   '2023-07': [NaT, NaT, NaT]})

huangapple
  • 本文由 发表于 2023年7月11日 00:42:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76655754.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定