英文:
Calculate business days by iterating over columns
问题
我有一个类似这样的数据框。
我想要计算事件日期与列中显示的月/日之间的工作日数量。输出将如下所示,将NaT转换为NaN。
有人知道如何最好地使用pandas来实现这个目标吗?
英文:
I have a dataframe that looks like this.
I would like to calculate the number of business days between the event date and the month/days show in the columns. The output would look like the below with the NaTs converted to nans:
Does anyone know how might be best to go about achieving this using pandas?
答案1
得分: 1
使用 numpy.busday_count
和广播/掩码,假设日期已经是 datetime 类型*:
import numpy as np
tmp = df.filter(regex=r'\d{4}-\d{2}')
df[tmp.columns] = np.where(tmp.notna(),
np.busday_count(df['Event_Date'].to_numpy(dtype='datetime64[D]')[:,None],
tmp.fillna('0').to_numpy(dtype='datetime64[D]')),
np.nan)
* 否则使用 cols = df.drop(columns='Account').columns ; df[cols] = df[cols].apply(pd.to_datetime, dayfirst=True)
。
输出结果:
Account Event_Date 2023-01 2023-02 2023-03 2023-04 2023-05 2023-06 2023-07
0 A 2023-04-25 -77.0 NaN NaN NaN NaN NaN NaN
1 B 2023-06-02 NaN NaN NaN 21.0 51.0 NaN NaN
2 C 2023-04-25 NaN NaN NaN NaN 7.0 NaN NaN
使用的输入数据:
from pandas import Timestamp
df = pd.DataFrame({'Account': ['A', 'B', 'C'],
'Event_Date': [Timestamp('2023-04-25 00:00:00'), Timestamp('2023-06-02 00:00:00'), Timestamp('2023-04-25 00:00:00')],
'2023-01': [Timestamp('2023-01-06 00:00:00'), NaT, NaT],
'2023-02': [NaT, NaT, NaT],
'2023-03': [NaT, NaT, NaT],
'2023-04': [NaT, Timestamp('2023-07-01 00:00:00'), NaT],
'2023-05': [NaT, Timestamp('2023-08-12 00:00:00'), Timestamp('2023-05-04 00:00:00')],
'2023-06': [NaT, NaT, NaT],
'2023-07': [NaT, NaT, NaT]})
英文:
Assuming the dates are already of datetime type*, use numpy.busday_count
and broadcasting/masking:
import numpy as np
tmp = df.filter(regex=r'\d{4}-\d{2}')
df[tmp.columns] = np.where(tmp.notna(),
np.busday_count(df['Event_Date'].to_numpy(dtype='datetime64[D]')[:,None],
tmp.fillna('0').to_numpy(dtype='datetime64[D]')),
np.nan)
* else convert with cols = df.drop(columns='Account').columns ; df[cols] = df[cols].apply(pd.to_datetime, dayfirst=True)
.
Output:
Account Event_Date 2023-01 2023-02 2023-03 2023-04 2023-05 2023-06 2023-07
0 A 2023-04-25 -77.0 NaN NaN NaN NaN NaN NaN
1 B 2023-06-02 NaN NaN NaN 21.0 51.0 NaN NaN
2 C 2023-04-25 NaN NaN NaN NaN 7.0 NaN NaN
Used input:
from pandas import Timestamp
df = pd.DataFrame({'Account': ['A', 'B', 'C'],
'Event_Date': [Timestamp('2023-04-25 00:00:00'), Timestamp('2023-06-02 00:00:00'), Timestamp('2023-04-25 00:00:00')],
'2023-01': [Timestamp('2023-01-06 00:00:00'), NaT, NaT],
'2023-02': [NaT, NaT, NaT],
'2023-03': [NaT, NaT, NaT],
'2023-04': [NaT, Timestamp('2023-07-01 00:00:00'), NaT],
'2023-05': [NaT, Timestamp('2023-08-12 00:00:00'), Timestamp('2023-05-04 00:00:00')],
'2023-06': [NaT, NaT, NaT],
'2023-07': [NaT, NaT, NaT]})
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论