问题

I understand your question. You want to perform rolling window operations in a DataFrame while respecting the index values as the window size. To achieve this, you can use the rolling method with a custom rolling window. Here's an example in Python:

import pandas as pd

# Assuming you have a DataFrame 'df' with your data

# Define a custom rolling window function
def custom_rolling_mean(series, window_size):
    return series.rolling(window_size, min_periods=1).mean()

# Specify the window size based on the index difference
window_size = df.index.to_series().diff().fillna(0).gt(1).cumsum()

# Calculate the rolling mean respecting the index
result = df.groupby(window_size)['var'].apply(lambda x: custom_rolling_mean(x, len(x)))

# Print the result
print(result)

This code creates a custom rolling window that respects the index differences, ensuring that the window size refers specifically to the distance between values as indicated by the index.

英文:

Presume I have a dataframe with the following data:

Index	var
100	4.49
101	7.58
102	4.76
103	4.48
104	5.13
105	7.24
106	4.69
107	4.78
108	4.19
205	9.34
206	11.10
207	9.15
208	10.01
209	11.64
210	13.93
211	12.99
212	13.30
213	9.32
214	4.53
215	11.13

As shown, the index has a typical resolution of 1, but has a gap between 108 and 205. This is just an example, in reality the dataframe will have several continuous (having res=1) sections, with any given gap size between them.

I would now like to perform rolling window operations, using df.rolling(), such as df.rolling(3).mean(). However, I notice that instead of creating the window based on the actual index values, the rolling operation seems to ignore the index and just sequentially grab values to satisfy the window size. Meaning, for example, you can have a window which includes the values for indices [108,108,205] or [108, 205, 206], etc.

How can I do this operation, but respect the index such that the window size refers specifically to the distance between values as indicated by the index?

答案1

得分: 2

这确实是rolling的工作方式。它独立考虑连续的数值，而不考虑索引值。

您可以暂时使用reindex来实现您期望的行为：

out = (df.reindex(range(df.index.min(), df.index.max()+1))
         .rolling(3).mean()
         .reindex(df.index)
      )

输出：

             var
Index           
100          NaN
101          NaN
102     5.610000
103     5.606667
104     4.790000
105     5.616667
106     5.686667
107     5.570000
108     4.553333
205          NaN
206          NaN
207     9.863333
208    10.086667
209    10.266667
210    11.86000
211    12.85333
212    13.40667
213    11.87000
214     9.05000
215     8.32667

使用rolling(3, min_periods=1).mean()的输出：

             var
Index           
100     4.490000
101     6.035000
102     5.610000
103     5.606667
104     4.790000
105     5.616667
106     5.686667
107     5.570000
108     4.553333
205     9.340000
206    10.220000
207     9.863333
208    10.086667
209    10.266667
210    11.860000
211    12.853333
212    13.406667
213    11.870000
214     9.050000
215     8.326667

英文:

This is indeed how rolling works. It considers consecutive values independently of the index values.

You could temporarily reindex to have your desired behavior:

out = (df.reindex(range(df.index.min(), df.index.max()+1))
         .rolling(3).mean()
         .reindex(df.index)
      )

Output:

             var
Index           
100          NaN
101          NaN
102     5.610000
103     5.606667
104     4.790000
105     5.616667
106     5.686667
107     5.570000
108     4.553333
205          NaN
206          NaN
207     9.863333
208    10.086667
209    10.266667
210    11.860000
211    12.853333
212    13.406667
213    11.870000
214     9.050000
215     8.326667

Output with rolling(3, min_periods=1).mean():

             var
Index           
100     4.490000
101     6.035000
102     5.610000
103     5.606667
104     4.790000
105     5.616667
106     5.686667
107     5.570000
108     4.553333
205     9.340000
206    10.220000
207     9.863333
208    10.086667
209    10.266667
210    11.860000
211    12.853333
212    13.406667
213    11.870000
214     9.050000
215     8.326667

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

df.rolling() 不尊重索引中的间隔吗？

问题

答案1

Pandas DataFrame – 在多列上使用groupby()函数分组连续数值块。

正则表达式匹配数字 – Python

Create tree like data structure in JSON format from Pandas Data frames using python.

如何使用`df.resample`处理离散时间？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论