英文:
How to create "calendar" effectively with Python(pandas)?
问题
我需要创建一个不同债券的“日历”来记录不同类型的优惠券。思路如下:有三种类型的债券。第一种类型每年支付一次优惠券,第二种每季度支付一次,第三种每月支付一次。我有以下表格
表1
债券 | 类型 | 发行日期 | 到期日期 |
---|---|---|---|
债券A | 每季度 | 2022-01-01 | 2032-01-01 |
债券B | 每月 | 2020-06-06 | 2025-06-06 |
在这个表中,每行都是唯一的,每个债券只出现一次,因此表中没有重复项。
因此,最终我希望得到如下表
表2
债券 | 类型 | 发行日期 | 到期日期 | 优惠券日期 |
---|---|---|---|---|
债券A | 每季度 | 2022-01-01 | 2032-01-01 | 2022-04-01 |
债券A | 每季度 | 2022-01-01 | 2032-01-01 | 2022-07-01 |
债券A | 每季度 | 2022-01-01 | 2032-01-01 | 2022-10-01 |
债券A | 每季度 | 2022-01-01 | 2032-01-01 | 2023-01-01 |
显然,这个表应该更长,但我希望理解预期结果的概念。
在这里,我应该指出,我不需要日期的天精度,我需要正确的年份和月份,这就是为什么你会看到我将所有优惠券日期设置为月的第一天。
我尝试使用pandas.Timedelta来完成此任务,但'M'(月)和'Y'(年)的值被删除,所以我必须以某种方式处理它们。
我尝试完成任务的方法如下:
我的导入
import numpy as np
import pandas as pd
import datetime
所以,表1是我的数据框。我将其分成了三个较小的框,根据债券类型。
注意:所有日期都以datetime64[ns]格式表示
annual_df = df[df.Type == 'Annually'].reset_index(drop=True)
quarter_df = df[df.Type == 'Quarterly'].reset_index(drop=True)
month_df = df[df.Type == 'Monthly'].reset_index(drop=True)
这是我用于年度优惠券债券的代码。
我向后运行,因为到期日期优惠券肯定已经支付
temp = pd.DataFrame(columns={'Bond', 'Type', 'Release date', 'Maturity date', 'Coupon date'})
for i in annual_df.index:
stop_date = annual_df['Release date'][i]
_date = annual_df['Maturity date'][i] #将传递到'Coupon date'列的日期
c = 0 # 一个常数
print(i, 'out of', annual_df.index.max()) # 用于跟踪迭代
while _date > stop_date:
arr = {'Bond':annual_df.Bond[i], 'Type':annual_df.Type[i], 'Release date':stop_date,
'Maturity date':annual_df['Maturity date'][i], 'Coupon date': _date}
temp = temp.append(arr, ignore_index = True)
c += 1
_date = datetime.datetime(annual_df['Maturity date'][i].year - c, annual_df['Maturity date'][i].month, 1)
annual_df = temp
所以,这段代码在我的包含25000个不同债券的数据集上运行大约一个小时,这是相当多的,但不像下面的季度支付优惠券代码那样长
temp = pd.DataFrame(columns={'Bond', 'Type', 'Release date', 'Maturity date', 'Coupon date'})
for i in quarter_df.index:
stop_date = quarter_df['Release date'][i]
_date = quarter_df['Maturity date'][i] #将传递到'Coupon date'列的日期
c = 0 # 一个常数
print(i, 'out of', quarter_df.index.max()) # 用于跟踪迭代
if quarter_df['Maturity date'][i].month < 4:
while _date > stop_date:
arr1 = {'Bond':annual_df.Bond[i], 'Type':annual_df.Type[i], 'Release date':stop_date,
'Maturity date':annual_df['Maturity date'][i], 'Coupon date': _date}
_date = datetime.datetime(annual_df['Maturity date'][i].year - c,
annual_df['Maturity date'][i].month + 3,1)
arr2 = {'Bond':annual_df.Bond[i], 'Type':annual_df.Type[i], 'Release date':stop_date,
'Maturity date':annual_df['Maturity date'][i], 'Coupon date': _date}
_date = datetime.datetime(annual_df['Maturity date'][i].year - c,
annual_df['Maturity date'][i].month + 6,1)
arr3 = {'Bond':annual_df.Bond[i], 'Type':annual_df.Type[i], 'Release date':stop_date,
'Maturity date':annual_df['Maturity date'][i], 'Coupon date': _date}
_date = datetime.datetime(annual_df['Maturity date'][i].year - c,
annual_df['Maturity date'][i].month + 9,1)
arr4 = {'Bond':annual_df.Bond[i], 'Type':annual_df.Type[i], 'Release date':stop_date,
'Maturity date':annual_df['Maturity date'][i], 'Coupon date': _date}
c += 1
_date = datetime.datetime(annual_df['Maturity date'][i].year - c,
annual_df['Maturity date'][i].month,1)
temp = temp.append(arr1,
<details>
<summary>英文:</summary>
I need to create a "calendar" for coupons of different bonds. The idea is the following: there are three types of coupon bonds. First type pays coupons annually, second pays coupons quarterly, third pays coupons every month. I am given the following table
***Table 1***
| Bond | Type | Release date | Maturity date |
| ------ | --------- | ------------ | ------------- |
| Bond A | Quarterly | 2022-01-01 | 2032-01-01 |
| Bond B | Monthly | 2020-06-06 | 2025-06-06 |
In this table each row is unique and each bond appears only once, so no duplicates are in the table.
So, in the end I expect to get the table like this
***Table 2***
| Bond | Type | Release date | Maturity date | Coupon date |
| ---- | --- | --- | --- | --- |
| Bond A | Quarterly | 2022-01-01 | 2032-01-01 | 2022-04-01 |
| Bond A | Quarterly | 2022-01-01 | 2032-01-01 | 2022-07-01 |
| Bond A | Quarterly | 2022-01-01 | 2032-01-01 | 2022-10-01 |
| Bond A | Quarterly | 2022-01-01 | 2032-01-01 | 2023-01-01 |
Obviously, this table should be much longer, but I hope the concept of expected result is clear.
Here I should note that I do not need date with day precision, I need to have correct year and month, that is why you will see that I made all coupon dates with first day of the month.
I tried using pandas.Timedelta for this task, but 'M'(month) and 'Y'(year) values were removed, so I have to deal without them somehow.
# My attempt to complete the task is the following:
My imports
import numpy as np
import pandas as pd
import datetime
So, table 1 is my dataframe. I've splitted it into three smaller frames based on bond types.
Note: all dates are in datetime64[ns] format
annual_df = df[df.Type == 'Annually'].reset_index(drop = True)
quarter_ df = df[df.Type == 'Quarterly'].reset_index(drop = True)
month_df = df[df.Type == 'Monthly'].reset_index(drop = True)
This is my code for annual coupon bonds
I go backwards because at maturity date coupond is definitely paid
temp = pd.DataFrame(columns={'Bond', 'Type', 'Release date', 'Maturity date', 'Coupon date'})
for i in annual_df.index:
stop_date = annual_df['Release date'][i]
_date = annual_df['Maturity date'][i] # the date that will be passed into 'Coupon date' column
c = 0 # a constant
print(i, 'out of', annual_df.index.max()) # to track the iteration
while _date > stop_date:
arr = {'Bond':annual_df.Bond[i], 'Type':annual_df.Type[i], 'Release date':stop_date,
'Maturity date':annual_df['Maturity date'][i], 'Coupon date': _date}
temp = temp.append(arr, ignore_index = True)
c += 1
_date = datetime.datetime(annual_df['Maturity date'][i].year - c, ['Maturity date'][i].month, 1)
annual_df = temp
So, this code runs about an hour on my dataset that contains 25000 distinct bonds, what is pretty a lot, but not as long as the following code for quarterly paid coupons
temp = pd.DataFrame(columns={'Bond', 'Type', 'Release date', 'Maturity date', 'Coupon date'})
for i in quarter_df.index:
stop_date = quarter_df['Release date'][i]
_date = quarter_df['Maturity date'][i] # the date that will be passed into 'Coupon date' column
c = 0 # a constant
print(i, 'out of', quarter_df.index.max()) # to track the iteration
if quarter_df['Maturity date'][i].month < 4:
while _date > stop_date:
arr1 = {'Bond':annual_df.Bond[i], 'Type':annual_df.Type[i], 'Release date':stop_date,
'Maturity date':annual_df['Maturity date'][i], 'Coupon date': _date}
_date = datetime.datetime(annual_df['Maturity date'][i].year - c,
['Maturity date'][i].month + 3,1)
arr2 = {'Bond':annual_df.Bond[i], 'Type':annual_df.Type[i], 'Release date':stop_date,
'Maturity date':annual_df['Maturity date'][i], 'Coupon date': _date}
_date = datetime.datetime(annual_df['Maturity date'][i].year - c,
['Maturity date'][i].month + 6,1)
arr3 = {'Bond':annual_df.Bond[i], 'Type':annual_df.Type[i], 'Release date':stop_date,
'Maturity date':annual_df['Maturity date'][i], 'Coupon date': _date}
_date = datetime.datetime(annual_df['Maturity date'][i].year - c,
['Maturity date'][i].month + 9,1)
arr4 = {'Bond':annual_df.Bond[i], 'Type':annual_df.Type[i], 'Release date':stop_date,
'Maturity date':annual_df['Maturity date'][i], 'Coupon date': _date}
c += 1
_date = datetime.datetime(annual_df['Maturity date'][i].year - c,
['Maturity date'][i].month,1)
temp = temp.append(arr1, ignore_index=True)
temp = temp.append(arr2, ignore_index=True)
temp = temp.append(arr3, ignore_index=True)
temp = temp.append(arr4, ignore_index=True)
elif quarter_df['Maturity date'][i].month > 3 and quarter_df['Maturity date'][i].month < 7:
#the same if but just with different _date month values, literaly Ctrl+C, Ctrl+V
elif quarter_df['Maturity date'][i].month > 6 and quarter_df['Maturity date'][i].month < 10:
#again, the same
else:
#same
So this part, from my calculations, takes around 2 seconds per each iteration, and I have a dataset of 34000 rows for this exact code, what turns out to work for 18.3 hours in total.
Do you know how to optimize this code?
Do you know any other method to do the same but shorter and faster?
</details>
# 答案1
**得分**: 1
你可以为每一行生成一系列的优惠券日期,然后将其展开:
```python
def coupon_dates(row: pd.Series):
n = {"Monthly": 1, "Quarterly": 3, "Annually": 12}[row["Type"]]
# 从发行日期到到期日期生成一系列的优惠券日期
return pd.date_range(
row["Release date"],
row["Maturity date"],
freq=pd.DateOffset(months=n),
inclusive="right",
)
df["Coupon date"] = df.apply(coupon_dates, axis=1)
df = df.explode("Coupon date")
英文:
You can generate a series of coupon dates for each row and then explode it:
def coupon_dates(row: pd.Series):
n = {"Monthly": 1, "Quarterly": 3, "Annually": 12}[row["Type"]]
# Generate a series of coupon dates from Release date to Maturity date
return pd.date_range(
row["Release date"],
row["Maturity date"],
freq=pd.DateOffset(months=n),
inclusive="right",
)
df["Coupon date"] = df.apply(coupon_dates, axis=1)
df = df.explode("Coupon date")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论