英文:
Using a Rolling Function in Pandas based on Date and a Categorical Column
问题
- 最近五张由该行客户开具的发票的 DaysLate 的均值
df["CustomerDaysLate_lastfiveinvoices"] = df.groupby("customerID").rolling(window=5, min_periods=1).DaysLate.mean().reset_index().set_index("level_1").sort_index()["DaysLate"]
- 最近30天内所有发票的 DaysLate 的均值
df = df.sort_values('InvoiceDate')
df["GlobalDaysLate_30days"] = df.rolling(window='30d', on="InvoiceDate").DaysLate.mean()
请注意,第二个问题是关于如何按客户ID获取最近30天内的均值,代码中没有提供解决方案。如果需要按客户ID获取最近30天内的均值,你需要进一步修改代码以实现这个目标。
英文:
Im currently working on a dataset where I am using the rolling function in pandas to
create features.
The functions rely on three columns a DaysLate numeric column from which the mean is calculated from, an Invoice Date column from which the date is derived from and a customerID column which denotes the customer of a row.
Im trying to get a rolling mean of the DaysLate for the last 30 days limited to invoices raised to a specific customerID.
The following two functions are working.
- Mean of DaysLate for the last five invoices raised for the row's customer
df["CustomerDaysLate_lastfiveinvoices"] = df.groupby("customerID").rolling(window = 5,min_periods = 1).\
DaysLate.mean().reset_index().set_index("level_1").\
sort_index()["DaysLate"]
- Mean of DaysLate for all invoices raised in the last 30 days
df = df.sort_values('InvoiceDate')
df["GlobalDaysLate_30days"] = df.rolling(window = '30d', on = "InvoiceDate").DaysLate.mean()
Just cant seem to find the code get the mean of the last 30 days by CustomerID. Any help on above is greatly appreciated.
答案1
得分: 1
将日期列设置为索引,然后排序以确保升序,然后按客户ID对排序后的数据框进行分组,对每个分组计算30天滚动均值。
mean_30d = (
df
.set_index('InnvoiceDate') # !important
.sort_index()
.groupby('customerID')
.rolling('30d')['DaysLate'].mean()
.reset_index(name='GlobalDaysLate_30days')
)
# 将滚动均值合并回原始数据框
result = df.merge(mean_30d)
英文:
Set the date column as index then sort to ensure ascending order then group the sorted dataframe by customer id and for each group calculate 30d rolling mean.
mean_30d = (
df
.set_index('InnvoiceDate') # !important
.sort_index()
.groupby('customerID')
.rolling('30d')['DaysLate'].mean()
.reset_index(name='GlobalDaysLate_30days')
)
# merge the rolling mean back to original dataframe
result = df.merge(mean_30d)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论