问题

I'm running the code below for computing rolling statistics over date on a dataset.

import pandas as pd
df = pd.DataFrame({'id': [1, 1, 1, 2, 2, 2], 
                   'date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-01', '2023-01-02', '2023-01-03'],
                   'value': [5, 4, 7, 2, 7, 1]})
df['date'] = pd.to_datetime(df['date'])
df.groupby('id')['value'].rolling(2).agg({'sum': 'sum', 'mean': 'mean'})

The code does not keep date in the result. Ideally I would like to keep the date for each statistics over time, but I only get an index number.

英文:

I'm running the code below for computing rolling statistics over date on a dataset.

import pandas as pd
df = pd.DataFrame({&#39;id&#39;: [1, 1, 1, 2, 2, 2], 
                   &#39;date&#39;: [&#39;2023-01-01&#39;, &#39;2023-01-02&#39;, &#39;2023-01-03&#39;, &#39;2023-01-01&#39;, &#39;2023-01-02&#39;, &#39;2023-01-03&#39;],
                   &#39;value&#39;: [5, 4, 7, 2, 7, 1]})
df[&#39;date&#39;] = pd.to_datetime(df[&#39;date&#39;])
df.groupby(&#39;id&#39;)[&#39;value&#39;].rolling(2).agg({&#39;sum&#39;: &#39;sum&#39;, &#39;mean&#39;: &#39;mean&#39;})

The code does not keep date in the result. Ideally I would like to keep the date for each statistics over time, but I only get an index number.

答案1

得分: 1

将date设置为索引以在聚合过程中保留它：

out = (df.set_index('date').groupby('id')['value']
       .rolling(2).agg(['sum', 'mean']).reset_index())
print(out)

   id       date   sum  mean
0   1 2023-01-01   NaN   NaN
1   1 2023-01-02   9.0   4.5
2   1 2023-01-03  11.0   5.5
3   2 2023-01-01   NaN   NaN
4   2 2023-01-02   9.0   4.5
5   2 2023-01-03   8.0   4.0

英文:

Set date as index to preserve it during aggregation:

out = (df.set_index(&#39;date&#39;).groupby(&#39;id&#39;)[&#39;value&#39;]
       .rolling(2).agg([&#39;sum&#39;, &#39;mean&#39;]).reset_index())
print(out)

   id       date   sum  mean
0   1 2023-01-01   NaN   NaN
1   1 2023-01-02   9.0   4.5
2   1 2023-01-03  11.0   5.5
3   2 2023-01-01   NaN   NaN
4   2 2023-01-02   9.0   4.5
5   2 2023-01-03   8.0   4.0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

保持日期在pandas groupby滚动聚合中。

问题

答案1

如何将数据框转换为数据集/对象

如何在CSV文件已存在时填写数据？

Pandas：将分组转换为 JSON 列表，不使用 groupby 或 apply。

QST: What is the canonical way to convert a column of type string[pyarrow] to boolean within a pandas dataframe?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。