英文:
Mean of X most recent years based on time series index
问题
import numpy as np
import pandas as pd
np.random.seed(1)
rows, cols = 4950, 5
data = np.random.rand(rows, cols)
tidx = pd.date_range('2010-01-01', periods=rows, freq='D')
df = pd.DataFrame(data, columns=['a', 'b', 'c', 'd', 'e'], index=tidx)
print(df)
对于时间序列中的每一天,我想计算前五年中相同日期的平均值。例如,对于日期 2023 年 1 月 1 日:取 2022 年 1 月 1 日、2021 年 1 月 1 日、2020 年 1 月 1 日、2019 年 1 月 1 日和 2018 年 1 月 1 日的平均值。我尝试了许多变体:
df.groupby([df.index.month, df.index.day]).mean()
但我无法让它在计算每一行的均值时只使用索引年份之前的五年。我想要添加五列,其中包含“滚动”平均值,并为每一列命名,例如“A_Avg”。
<details>
<summary>英文:</summary>
I have a time series of 10 years of days, with five columns of data. Here is code to randomly generate:
import numpy as np
import pandas as pd
np.random.seed(1)
rows,cols = 4950,5
data = np.random.rand(rows,cols)
tidx = pd.date_range('2010-01-01', periods=rows, freq='D')
df = pd.DataFrame(data, columns=['a','b','c','d','e'], index=tidx)
print (df)
For each day in the time series I would like to calculate the mean for the same day in the PREVIOUS five years. For example for the date 1/1/2023: take the average of 1/1/2022, 1/1/2021, 1/1/2020, 1/1/2019 and 1/1/2018. I've tried many variations on:
df.groupby([df.index.month, df.index.day]).mean()
But I can't get it use just the five years prior to the index.year in the mean calculation for each row. I would like to add five columns with the "rolling" average, and name each column for example "A_Avg"
</details>
# 答案1
**得分**: 1
从您的DataFrame开始,我们可以首先将`Date`列转换为[`Datetime`](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html):
```python
df['Date'] = pd.to_datetime(df['Date'], format="%d/%m/%Y")
然后我们将year
,month
和year
分开成列:
df['year'] = df['Date'].dt.year
df['month'] = df['Date'].dt.month
df['day'] = df['Date'].dt.day
接下来,我们根据day
,month
和year
重新排序行,并相应地重新索引DataFrame,如下所示:
df = df.sort_values(by=['day', 'month', 'year']).reset_index(drop=True)
最后,我们可以使用设置为5的rolling
方法来获得每列的期望结果:
df['A_Avg'] = df.groupby(['day', 'month'], as_index=False)['A'].rolling(5).mean()['A']
df['B_Avg'] = df.groupby(['day', 'month'], as_index=False)['B'].rolling(5).mean()['B']
df['C_Avg'] = df.groupby(['day', 'month'], as_index=False)['C'].rolling(5).mean()['C']
df['D_Avg'] = df.groupby(['day', 'month'], as_index=False)['D'].rolling(5).mean()['D']
df['E_Avg'] = df.groupby(['day', 'month'], as_index=False)['E'].rolling(5).mean()['E']
英文:
From your DataFrame, we can start by convert the Date
column as Datetime
:
df['Date'] = pd.to_datetime(df['Date'], format="%d/%m/%Y")
Then we separate the year
, month
and year
in columns :
df['year'] = df['Date'].dt.year
df['month'] = df['Date'].dt.month
df['day'] = df['Date'].dt.day
Next step, we reorder the row depending on day
, month
and year
and reindex the DataFrame accordingly like so :
df = df.sort_values(by=['day', 'month', 'year']).reset_index(drop=True)
To finish, we can use the rolling
method setup on 5 to get the expected result on each columns :
df['A_Avg'] = df.groupby(['day', 'month'], as_index=False)['A'].rolling(5).mean()['A']
df['B_Avg'] = df.groupby(['day', 'month'], as_index=False)['B'].rolling(5).mean()['B']
df['C_Avg'] = df.groupby(['day', 'month'], as_index=False)['C'].rolling(5).mean()['C']
df['D_Avg'] = df.groupby(['day', 'month'], as_index=False)['D'].rolling(5).mean()['D']
df['E_Avg'] = df.groupby(['day', 'month'], as_index=False)['E'].rolling(5).mean()['E']
答案2
得分: 1
这听起来你想要 rolling
。
mean_5y_prev = df.groupby([df.index.month, df.index.day]).rolling(5).mean()
然后你可以与 df
进行连接:
df.join(mean_5y_prev.droplevel([0, 1]), rsuffix='_Avg')
英文:
It sounds like you want rolling
.
mean_5y_prev = df.groupby([df.index.month, df.index.day]).rolling(5).mean()
Then you can join back to df
:
df.join(mean_5y_prev.droplevel([0, 1]), rsuffix='_Avg')
答案3
得分: 0
创建数据框。
df = pd.DataFrame({
'Date': ['1/1/2023', '1/2/2023', '1/3/2023', '1/1/2022', '1/2/2022', '1/3/2022', '1/1/2015', '1/2/2015', '1/3/2015'],
'A': [100, 105, 110, 100, 105, 110, 100, 105, 110],
'B': [200, 205, 210, 200, 205, 210, 200, 205, 210],
'C': [300, 305, 310, 300, 305, 310, 300, 305, 310],
'D': [400, 405, 410, 400, 405, 410, 400, 405, 410],
'E': [500, 505, 510, 500, 505, 510, 500, 505, 510],
})
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y')
df['year'] = df['Date'].dt.year
df['month'] = df['Date'].dt.month
df['day'] = df['Date'].dt.day
获取可以用来分割数据框的年份列表。
bins = [y for y in range(df['year'].max(), df['year'].min() - 5, -5)]
bins.reverse()
# [2013, 2018, 2023]
计算分组的均值。
df['year'] = pd.cut(df['year'], bins)
df = df.groupby(['year', 'month', 'day'], as_index=False).mean(numeric_only=True)
输出:
year month day A B C D E
0 (2013, 2018] 1 1 100.0 200.0 300.0 400.0 500.0
1 (2013, 2018] 1 2 105.0 205.0 305.0 405.0 505.0
2 (2013, 2018] 1 3 110.0 210.0 310.0 410.0 510.0
3 (2018, 2023] 1 1 100.0 200.0 300.0 400.0 500.0
4 (2018, 2023] 1 2 105.0 205.0 305.0 405.0 505.0
5 (2018, 2023] 1 3 110.0 210.0 310.0 410.0 510.0
英文:
Create dataframe.
df = pd.DataFrame({
'Date': ['1/1/2023', '1/2/2023', '1/3/2023', '1/1/2022', '1/2/2022', '1/3/2022', '1/1/2015', '1/2/2015', '1/3/2015'],
'A': [100, 105, 110, 100, 105, 110, 100, 105, 110],
'B': [200, 205, 210, 200, 205, 210, 200, 205, 210],
'C': [300, 305, 310, 300, 305, 310, 300, 305, 310],
'D': [400, 405, 410, 400, 405, 410, 400, 405, 410],
'E': [500, 505, 510, 500, 505, 510, 500, 505, 510],
})
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y')
df['year'] = df['Date'].dt.year
df['month'] = df['Date'].dt.month
df['day'] = df['Date'].dt.day
Get a list of years which can be used to cut the dataframe.
bins = [y for y in range(df['year'].max(), df['year'].min() - 5, -5)]
bins.reverse()
# [2013, 2018, 2023]
Calculate the mean of groups.
df['year'] = pd.cut(df['year'], bins)
df = df.groupby(['year', 'month', 'day'], as_index=False).mean(numeric_only=True)
output:
year month day A B C D E
0 (2013, 2018] 1 1 100.0 200.0 300.0 400.0 500.0
1 (2013, 2018] 1 2 105.0 205.0 305.0 405.0 505.0
2 (2013, 2018] 1 3 110.0 210.0 310.0 410.0 510.0
3 (2018, 2023] 1 1 100.0 200.0 300.0 400.0 500.0
4 (2018, 2023] 1 2 105.0 205.0 305.0 405.0 505.0
5 (2018, 2023] 1 3 110.0 210.0 310.0 410.0 510.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论