英文:
df.rolling() not respecting gaps in the index?
问题
I understand your question. You want to perform rolling window operations in a DataFrame while respecting the index values as the window size. To achieve this, you can use the rolling
method with a custom rolling window. Here's an example in Python:
import pandas as pd
# Assuming you have a DataFrame 'df' with your data
# Define a custom rolling window function
def custom_rolling_mean(series, window_size):
return series.rolling(window_size, min_periods=1).mean()
# Specify the window size based on the index difference
window_size = df.index.to_series().diff().fillna(0).gt(1).cumsum()
# Calculate the rolling mean respecting the index
result = df.groupby(window_size)['var'].apply(lambda x: custom_rolling_mean(x, len(x)))
# Print the result
print(result)
This code creates a custom rolling window that respects the index differences, ensuring that the window size refers specifically to the distance between values as indicated by the index.
英文:
Presume I have a dataframe with the following data:
Index | var |
---|---|
100 | 4.49 |
101 | 7.58 |
102 | 4.76 |
103 | 4.48 |
104 | 5.13 |
105 | 7.24 |
106 | 4.69 |
107 | 4.78 |
108 | 4.19 |
205 | 9.34 |
206 | 11.10 |
207 | 9.15 |
208 | 10.01 |
209 | 11.64 |
210 | 13.93 |
211 | 12.99 |
212 | 13.30 |
213 | 9.32 |
214 | 4.53 |
215 | 11.13 |
As shown, the index has a typical resolution of 1, but has a gap between 108 and 205. This is just an example, in reality the dataframe will have several continuous (having res=1) sections, with any given gap size between them.
I would now like to perform rolling window operations, using df.rolling(), such as df.rolling(3).mean(). However, I notice that instead of creating the window based on the actual index values, the rolling operation seems to ignore the index and just sequentially grab values to satisfy the window size. Meaning, for example, you can have a window which includes the values for indices [108,108,205] or [108, 205, 206], etc.
How can I do this operation, but respect the index such that the window size refers specifically to the distance between values as indicated by the index?
答案1
得分: 2
这确实是rolling
的工作方式。它独立考虑连续的数值,而不考虑索引值。
您可以暂时使用reindex
来实现您期望的行为:
out = (df.reindex(range(df.index.min(), df.index.max()+1))
.rolling(3).mean()
.reindex(df.index)
)
输出:
var
Index
100 NaN
101 NaN
102 5.610000
103 5.606667
104 4.790000
105 5.616667
106 5.686667
107 5.570000
108 4.553333
205 NaN
206 NaN
207 9.863333
208 10.086667
209 10.266667
210 11.86000
211 12.85333
212 13.40667
213 11.87000
214 9.05000
215 8.32667
使用rolling(3, min_periods=1).mean()
的输出:
var
Index
100 4.490000
101 6.035000
102 5.610000
103 5.606667
104 4.790000
105 5.616667
106 5.686667
107 5.570000
108 4.553333
205 9.340000
206 10.220000
207 9.863333
208 10.086667
209 10.266667
210 11.860000
211 12.853333
212 13.406667
213 11.870000
214 9.050000
215 8.326667
英文:
This is indeed how rolling
works. It considers consecutive values independently of the index values.
You could temporarily reindex
to have your desired behavior:
out = (df.reindex(range(df.index.min(), df.index.max()+1))
.rolling(3).mean()
.reindex(df.index)
)
Output:
var
Index
100 NaN
101 NaN
102 5.610000
103 5.606667
104 4.790000
105 5.616667
106 5.686667
107 5.570000
108 4.553333
205 NaN
206 NaN
207 9.863333
208 10.086667
209 10.266667
210 11.860000
211 12.853333
212 13.406667
213 11.870000
214 9.050000
215 8.326667
Output with rolling(3, min_periods=1).mean()
:
var
Index
100 4.490000
101 6.035000
102 5.610000
103 5.606667
104 4.790000
105 5.616667
106 5.686667
107 5.570000
108 4.553333
205 9.340000
206 10.220000
207 9.863333
208 10.086667
209 10.266667
210 11.860000
211 12.853333
212 13.406667
213 11.870000
214 9.050000
215 8.326667
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论