2023年2月19日 16:07:17go评论90阅读模式

英文:

Using a Rolling Function in Pandas based on Date and a Categorical Column

问题

最近五张由该行客户开具的发票的 DaysLate 的均值

df["CustomerDaysLate_lastfiveinvoices"] = df.groupby("customerID").rolling(window=5, min_periods=1).DaysLate.mean().reset_index().set_index("level_1").sort_index()["DaysLate"]

最近30天内所有发票的 DaysLate 的均值

df = df.sort_values('InvoiceDate')
df["GlobalDaysLate_30days"] = df.rolling(window='30d', on="InvoiceDate").DaysLate.mean()

请注意，第二个问题是关于如何按客户ID获取最近30天内的均值，代码中没有提供解决方案。如果需要按客户ID获取最近30天内的均值，你需要进一步修改代码以实现这个目标。

英文:

Im currently working on a dataset where I am using the rolling function in pandas to
create features.

The functions rely on three columns a DaysLate numeric column from which the mean is calculated from, an Invoice Date column from which the date is derived from and a customerID column which denotes the customer of a row.

Im trying to get a rolling mean of the DaysLate for the last 30 days limited to invoices raised to a specific customerID.

The following two functions are working.

Mean of DaysLate for the last five invoices raised for the row's customer

df[&quot;CustomerDaysLate_lastfiveinvoices&quot;] = df.groupby(&quot;customerID&quot;).rolling(window = 5,min_periods = 1).\
                              DaysLate.mean().reset_index().set_index(&quot;level_1&quot;).\
                              sort_index()[&quot;DaysLate&quot;]

Mean of DaysLate for all invoices raised in the last 30 days

df = df.sort_values(&#39;InvoiceDate&#39;)
df[&quot;GlobalDaysLate_30days&quot;] = df.rolling(window = &#39;30d&#39;, on = &quot;InvoiceDate&quot;).DaysLate.mean()

Just cant seem to find the code get the mean of the last 30 days by CustomerID. Any help on above is greatly appreciated.

答案1

得分: 1

将日期列设置为索引，然后排序以确保升序，然后按客户ID对排序后的数据框进行分组，对每个分组计算30天滚动均值。

mean_30d = (
    df
    .set_index('InnvoiceDate')  # !important
    .sort_index()
    .groupby('customerID')
    .rolling('30d')['DaysLate'].mean()
    .reset_index(name='GlobalDaysLate_30days')
)
# 将滚动均值合并回原始数据框
result = df.merge(mean_30d)

英文:

Set the date column as index then sort to ensure ascending order then group the sorted dataframe by customer id and for each group calculate 30d rolling mean.

mean_30d = (
    df
    .set_index(&#39;InnvoiceDate&#39;) # !important
    .sort_index()
    .groupby(&#39;customerID&#39;)
    .rolling(&#39;30d&#39;)[&#39;DaysLate&#39;].mean()
    .reset_index(name=&#39;GlobalDaysLate_30days&#39;)
)
# merge the rolling mean back to original dataframe
result = df.merge(mean_30d)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用 Pandas 中基于日期和分类列的滚动函数

问题

答案1

Pandas将.xlsx列读取为日期时间而不是浮点数。

在Go语言中如何将C和Python代码串联起来？

如何使用SQLAlchemy Connection.execute()传递多个参数给INSERT INTO … VALUES？

在Python中进行字符串的掩码和解除掩码操作。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。