英文:
Keep date in pandas groupby rolling aggregation
问题
I'm running the code below for computing rolling statistics over date on a dataset.
import pandas as pd
df = pd.DataFrame({'id': [1, 1, 1, 2, 2, 2],
'date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-01', '2023-01-02', '2023-01-03'],
'value': [5, 4, 7, 2, 7, 1]})
df['date'] = pd.to_datetime(df['date'])
df.groupby('id')['value'].rolling(2).agg({'sum': 'sum', 'mean': 'mean'})
The code does not keep date in the result. Ideally I would like to keep the date for each statistics over time, but I only get an index number.
英文:
I'm running the code below for computing rolling statistics over date on a dataset.
import pandas as pd
df = pd.DataFrame({'id': [1, 1, 1, 2, 2, 2],
'date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-01', '2023-01-02', '2023-01-03'],
'value': [5, 4, 7, 2, 7, 1]})
df['date'] = pd.to_datetime(df['date'])
df.groupby('id')['value'].rolling(2).agg({'sum': 'sum', 'mean': 'mean'})
The code does not keep date in the result. Ideally I would like to keep the date for each statistics over time, but I only get an index number.
答案1
得分: 1
将date
设置为索引以在聚合过程中保留它:
out = (df.set_index('date').groupby('id')['value']
.rolling(2).agg(['sum', 'mean']).reset_index())
print(out)
id date sum mean
0 1 2023-01-01 NaN NaN
1 1 2023-01-02 9.0 4.5
2 1 2023-01-03 11.0 5.5
3 2 2023-01-01 NaN NaN
4 2 2023-01-02 9.0 4.5
5 2 2023-01-03 8.0 4.0
英文:
Set date
as index to preserve it during aggregation:
out = (df.set_index('date').groupby('id')['value']
.rolling(2).agg(['sum', 'mean']).reset_index())
print(out)
id date sum mean
0 1 2023-01-01 NaN NaN
1 1 2023-01-02 9.0 4.5
2 1 2023-01-03 11.0 5.5
3 2 2023-01-01 NaN NaN
4 2 2023-01-02 9.0 4.5
5 2 2023-01-03 8.0 4.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论