2023年8月4日 08:15:07go评论118阅读模式

英文:

Mean of X most recent years based on time series index

问题

import numpy as np
import pandas as pd
np.random.seed(1)
rows, cols = 4950, 5
data = np.random.rand(rows, cols)
tidx = pd.date_range('2010-01-01', periods=rows, freq='D')
df = pd.DataFrame(data, columns=['a', 'b', 'c', 'd', 'e'], index=tidx)
print(df)

对于时间序列中的每一天，我想计算前五年中相同日期的平均值。例如，对于日期 2023 年 1 月 1 日：取 2022 年 1 月 1 日、2021 年 1 月 1 日、2020 年 1 月 1 日、2019 年 1 月 1 日和 2018 年 1 月 1 日的平均值。我尝试了许多变体：

df.groupby([df.index.month, df.index.day]).mean()

但我无法让它在计算每一行的均值时只使用索引年份之前的五年。我想要添加五列，其中包含“滚动”平均值，并为每一列命名，例如“A_Avg”。


<details>
<summary>英文:</summary>
I have a time series of 10 years of days, with five columns of data.  Here is code to randomly generate:

import numpy as np
import pandas as pd
np.random.seed(1)

rows,cols = 4950,5
data = np.random.rand(rows,cols)
tidx = pd.date_range('2010-01-01', periods=rows, freq='D')
df = pd.DataFrame(data, columns=['a','b','c','d','e'], index=tidx)
print (df)


For each day in the time series I would like to calculate the mean for the same day in the PREVIOUS five years.  For example for the date 1/1/2023: take the average of 1/1/2022, 1/1/2021, 1/1/2020, 1/1/2019 and 1/1/2018.  I&#39;ve tried many variations on:

df.groupby([df.index.month, df.index.day]).mean()


But I can&#39;t get it use just the five years prior to the index.year in the mean calculation for each row.  I would like to add five columns with the &quot;rolling&quot; average, and name each column for example &quot;A_Avg&quot;
</details>
# 答案1
**得分**: 1
从您的DataFrame开始，我们可以首先将`Date`列转换为[`Datetime`](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html)：
```python
df['Date'] = pd.to_datetime(df['Date'], format="%d/%m/%Y")

然后我们将year，month和year分开成列：

df['year'] = df['Date'].dt.year
df['month'] = df['Date'].dt.month
df['day'] = df['Date'].dt.day

接下来，我们根据day，month和year重新排序行，并相应地重新索引DataFrame，如下所示：

df = df.sort_values(by=['day', 'month', 'year']).reset_index(drop=True)

最后，我们可以使用设置为5的rolling方法来获得每列的期望结果：

df['A_Avg'] = df.groupby(['day', 'month'], as_index=False)['A'].rolling(5).mean()['A']
df['B_Avg'] = df.groupby(['day', 'month'], as_index=False)['B'].rolling(5).mean()['B']
df['C_Avg'] = df.groupby(['day', 'month'], as_index=False)['C'].rolling(5).mean()['C']
df['D_Avg'] = df.groupby(['day', 'month'], as_index=False)['D'].rolling(5).mean()['D']
df['E_Avg'] = df.groupby(['day', 'month'], as_index=False)['E'].rolling(5).mean()['E']

英文:

From your DataFrame, we can start by convert the Date column as Datetime :

df[&#39;Date&#39;] = pd.to_datetime(df[&#39;Date&#39;], format=&quot;%d/%m/%Y&quot;)

Then we separate the year, month and year in columns :

df[&#39;year&#39;] = df[&#39;Date&#39;].dt.year
df[&#39;month&#39;] = df[&#39;Date&#39;].dt.month
df[&#39;day&#39;] = df[&#39;Date&#39;].dt.day

Next step, we reorder the row depending on day, month and year and reindex the DataFrame accordingly like so :

df = df.sort_values(by=[&#39;day&#39;, &#39;month&#39;, &#39;year&#39;]).reset_index(drop=True)

To finish, we can use the rolling method setup on 5 to get the expected result on each columns :

df[&#39;A_Avg&#39;] = df.groupby([&#39;day&#39;, &#39;month&#39;], as_index=False)[&#39;A&#39;].rolling(5).mean()[&#39;A&#39;]
df[&#39;B_Avg&#39;] = df.groupby([&#39;day&#39;, &#39;month&#39;], as_index=False)[&#39;B&#39;].rolling(5).mean()[&#39;B&#39;]
df[&#39;C_Avg&#39;] = df.groupby([&#39;day&#39;, &#39;month&#39;], as_index=False)[&#39;C&#39;].rolling(5).mean()[&#39;C&#39;]
df[&#39;D_Avg&#39;] = df.groupby([&#39;day&#39;, &#39;month&#39;], as_index=False)[&#39;D&#39;].rolling(5).mean()[&#39;D&#39;]
df[&#39;E_Avg&#39;] = df.groupby([&#39;day&#39;, &#39;month&#39;], as_index=False)[&#39;E&#39;].rolling(5).mean()[&#39;E&#39;]

答案2

得分: 1

这听起来你想要 rolling。

mean_5y_prev = df.groupby([df.index.month, df.index.day]).rolling(5).mean()

然后你可以与 df 进行连接：

df.join(mean_5y_prev.droplevel([0, 1]), rsuffix='_Avg')

英文:

It sounds like you want rolling.

mean_5y_prev = df.groupby([df.index.month, df.index.day]).rolling(5).mean()

Then you can join back to df:

df.join(mean_5y_prev.droplevel([0, 1]), rsuffix=&#39;_Avg&#39;)

答案3

得分: 0

创建数据框。

df = pd.DataFrame({
    'Date': ['1/1/2023', '1/2/2023', '1/3/2023', '1/1/2022', '1/2/2022', '1/3/2022', '1/1/2015', '1/2/2015', '1/3/2015'],
    'A': [100, 105, 110, 100, 105, 110, 100, 105, 110],
    'B': [200, 205, 210, 200, 205, 210, 200, 205, 210],
    'C': [300, 305, 310, 300, 305, 310, 300, 305, 310],
    'D': [400, 405, 410, 400, 405, 410, 400, 405, 410],
    'E': [500, 505, 510, 500, 505, 510, 500, 505, 510],
})
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y')
df['year'] = df['Date'].dt.year
df['month'] = df['Date'].dt.month
df['day'] = df['Date'].dt.day

获取可以用来分割数据框的年份列表。

bins = [y for y in range(df['year'].max(), df['year'].min() - 5, -5)]
bins.reverse()
# [2013, 2018, 2023]

计算分组的均值。

df['year'] = pd.cut(df['year'], bins)
df = df.groupby(['year', 'month', 'day'], as_index=False).mean(numeric_only=True)

输出：

           year  month  day      A      B      C      D      E
0  (2013, 2018]      1    1  100.0  200.0  300.0  400.0  500.0
1  (2013, 2018]      1    2  105.0  205.0  305.0  405.0  505.0
2  (2013, 2018]      1    3  110.0  210.0  310.0  410.0  510.0
3  (2018, 2023]      1    1  100.0  200.0  300.0  400.0  500.0
4  (2018, 2023]      1    2  105.0  205.0  305.0  405.0  505.0
5  (2018, 2023]      1    3  110.0  210.0  310.0  410.0  510.0

英文:

Create dataframe.

df = pd.DataFrame({
    &#39;Date&#39;: [&#39;1/1/2023&#39;, &#39;1/2/2023&#39;, &#39;1/3/2023&#39;, &#39;1/1/2022&#39;, &#39;1/2/2022&#39;, &#39;1/3/2022&#39;, &#39;1/1/2015&#39;, &#39;1/2/2015&#39;, &#39;1/3/2015&#39;],
    &#39;A&#39;: [100, 105, 110, 100, 105, 110, 100, 105, 110],
    &#39;B&#39;: [200, 205, 210, 200, 205, 210, 200, 205, 210],
    &#39;C&#39;: [300, 305, 310, 300, 305, 310, 300, 305, 310],
    &#39;D&#39;: [400, 405, 410, 400, 405, 410, 400, 405, 410],
    &#39;E&#39;: [500, 505, 510, 500, 505, 510, 500, 505, 510],
})
df[&#39;Date&#39;] = pd.to_datetime(df[&#39;Date&#39;], format=&#39;%m/%d/%Y&#39;)
df[&#39;year&#39;] = df[&#39;Date&#39;].dt.year
df[&#39;month&#39;] = df[&#39;Date&#39;].dt.month
df[&#39;day&#39;] = df[&#39;Date&#39;].dt.day

Get a list of years which can be used to cut the dataframe.

bins = [y for y in range(df[&#39;year&#39;].max(), df[&#39;year&#39;].min() - 5, -5)]
bins.reverse()
# [2013, 2018, 2023]

Calculate the mean of groups.

df[&#39;year&#39;] = pd.cut(df[&#39;year&#39;], bins)
df = df.groupby([&#39;year&#39;, &#39;month&#39;, &#39;day&#39;], as_index=False).mean(numeric_only=True)

output:

           year  month  day      A      B      C      D      E
0  (2013, 2018]      1    1  100.0  200.0  300.0  400.0  500.0
1  (2013, 2018]      1    2  105.0  205.0  305.0  405.0  505.0
2  (2013, 2018]      1    3  110.0  210.0  310.0  410.0  510.0
3  (2018, 2023]      1    1  100.0  200.0  300.0  400.0  500.0
4  (2018, 2023]      1    2  105.0  205.0  305.0  405.0  505.0
5  (2018, 2023]      1    3  110.0  210.0  310.0  410.0  510.0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

X最近年度的均值，基于时间序列索引。

问题

答案2

答案3

附加字段未在Django管理面板中显示。

PySpark 3高阶函数用于提取到列中

如何从相关的Django模型中获取字段ID

数据框的成对重塑。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。