df.rolling() 不尊重索引中的间隔吗?

huangapple go评论58阅读模式
英文:

df.rolling() not respecting gaps in the index?

问题

I understand your question. You want to perform rolling window operations in a DataFrame while respecting the index values as the window size. To achieve this, you can use the rolling method with a custom rolling window. Here's an example in Python:

import pandas as pd

# Assuming you have a DataFrame 'df' with your data

# Define a custom rolling window function
def custom_rolling_mean(series, window_size):
    return series.rolling(window_size, min_periods=1).mean()

# Specify the window size based on the index difference
window_size = df.index.to_series().diff().fillna(0).gt(1).cumsum()

# Calculate the rolling mean respecting the index
result = df.groupby(window_size)['var'].apply(lambda x: custom_rolling_mean(x, len(x)))

# Print the result
print(result)

This code creates a custom rolling window that respects the index differences, ensuring that the window size refers specifically to the distance between values as indicated by the index.

英文:

Presume I have a dataframe with the following data:

Index var
100 4.49
101 7.58
102 4.76
103 4.48
104 5.13
105 7.24
106 4.69
107 4.78
108 4.19
205 9.34
206 11.10
207 9.15
208 10.01
209 11.64
210 13.93
211 12.99
212 13.30
213 9.32
214 4.53
215 11.13

As shown, the index has a typical resolution of 1, but has a gap between 108 and 205. This is just an example, in reality the dataframe will have several continuous (having res=1) sections, with any given gap size between them.

I would now like to perform rolling window operations, using df.rolling(), such as df.rolling(3).mean(). However, I notice that instead of creating the window based on the actual index values, the rolling operation seems to ignore the index and just sequentially grab values to satisfy the window size. Meaning, for example, you can have a window which includes the values for indices [108,108,205] or [108, 205, 206], etc.

How can I do this operation, but respect the index such that the window size refers specifically to the distance between values as indicated by the index?

答案1

得分: 2

这确实是rolling的工作方式。它独立考虑连续的数值,而不考虑索引值。

您可以暂时使用reindex来实现您期望的行为:

out = (df.reindex(range(df.index.min(), df.index.max()+1))
         .rolling(3).mean()
         .reindex(df.index)
      )

输出:

             var
Index           
100          NaN
101          NaN
102     5.610000
103     5.606667
104     4.790000
105     5.616667
106     5.686667
107     5.570000
108     4.553333
205          NaN
206          NaN
207     9.863333
208    10.086667
209    10.266667
210    11.86000
211    12.85333
212    13.40667
213    11.87000
214     9.05000
215     8.32667

使用rolling(3, min_periods=1).mean()的输出:

             var
Index           
100     4.490000
101     6.035000
102     5.610000
103     5.606667
104     4.790000
105     5.616667
106     5.686667
107     5.570000
108     4.553333
205     9.340000
206    10.220000
207     9.863333
208    10.086667
209    10.266667
210    11.860000
211    12.853333
212    13.406667
213    11.870000
214     9.050000
215     8.326667
英文:

This is indeed how rolling works. It considers consecutive values independently of the index values.

You could temporarily reindex to have your desired behavior:

out = (df.reindex(range(df.index.min(), df.index.max()+1))
         .rolling(3).mean()
         .reindex(df.index)
      )

Output:

             var
Index           
100          NaN
101          NaN
102     5.610000
103     5.606667
104     4.790000
105     5.616667
106     5.686667
107     5.570000
108     4.553333
205          NaN
206          NaN
207     9.863333
208    10.086667
209    10.266667
210    11.860000
211    12.853333
212    13.406667
213    11.870000
214     9.050000
215     8.326667

Output with rolling(3, min_periods=1).mean():

             var
Index           
100     4.490000
101     6.035000
102     5.610000
103     5.606667
104     4.790000
105     5.616667
106     5.686667
107     5.570000
108     4.553333
205     9.340000
206    10.220000
207     9.863333
208    10.086667
209    10.266667
210    11.860000
211    12.853333
212    13.406667
213    11.870000
214     9.050000
215     8.326667

huangapple
  • 本文由 发表于 2023年5月22日 19:46:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76305841.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定