在 Python 中循环内的滚动均值/平均值对 DataFrame 进行计算

huangapple go评论65阅读模式
英文:

Rolling Mean/Average within a For Loop on a Dataframe Python

问题

I went through a bunch of posts and couldn't find a more "python" appropriate solution. What I have is a dataframe, then I run a For Loop to calculate several metrics. Throughout the loop many of the columns are dependent on each other so I want to calculate everything up until that point. The issue is that the only way to make the rolling method work within the loop is to run it (or so I think) for the entire column every iteration. I am sure there has to be a better way. Here is a Sample, I have the following DF, where the Column Value is generated within a For Loop:

Minute Value Rolling Mean
1 3 0
2 5 0
3 8 5.3333
4 4 5.6667
5 6 6
6 7 5.6667

what I am using to calculate the mean through the for loop is this:

n_periods = 3
df['Rolling Mean'] = df['Value'].rolling(n_periods, min_periods = 0).mean(skipna=False)

The problem here is that as you iterate within the for loop it reruns the entire column every row, and I have thousands of them, so it is very slow.

I would want something more like this (which doesn't work), which would only run one calculation per row throughout the loop.

for i in range(1, len(df)):
df.at[i, 'Rolling Mean'] = df.at[i, 'Value'].rolling(n_periods, min_periods=0).mean(skipna=False)

Any thoughts?
Thank you all!

英文:

I went through a bunch of posts and couldn't find a more "python" appropiate solution. What I have is a dataframe, then I run a For Loop to calculate several metrics. Throughout the loop many of the columns are dependent on each other so I want to calculate everything up until that point. The issue is that the only way to make the rolling method work within the loop is to run it (or so I think) for the entire column every iteration. I am sure there has to be a better way. Here is a Sample, I have the following DF, where the Column Value is generated within a For Loop:

Minute  Value Rolling Mean
1         3      0
2         5      0
3         8      5.3333
4         4      5.6667
5         6      6
6         7      5.6667

what I am using to calculate the mean through the for loop is this:

n_periods = 3
df['Rolling Mean'] = df['Value'].rolling(n_periods, min_periods = 0).mean(skipna=False)

The proble here is that as you iterate within the for loop it reruns the entire column every row, and I have thousands of them, so it is very slow.
I would want something more like this (which doesn't work), which would only run one calculation per row throughout the loop.

for i in range(1, len(df)):
    df.at[i, 'Rolling Mean'] = df.at[i, 'Value'].rolling(n_periods, min_periods=0).mean(skipna=False)

Any thoughts?
Thank you all!

答案1

得分: 0

以下是翻译好的代码部分:

# 遍历DataFrame的索引
for i in range(1, len(df)):
    # 使用df.at计算滚动均值并赋值给 'Rolling Mean' 列
    df.at[i, 'Rolling Mean'] = df['Value'][0:i+1].rolling(window=n_periods, min_periods=0).mean().iloc[-1]

希望这对您有帮助!

英文:

Ok, so after some trial and error, I figured this line worked. In case anybody else needs it:

# Loop through the indices of the DataFrame
for i in range(1, len(df)):
    # Calculate the rolling mean using df.at and assign it to the 'Rolling Mean' column
    df.at[i, 'Rolling Mean'] = df['Value'][0:i+1].rolling(window=n_periods, min_periods=0).mean().iloc[-1]

Thank you all for the support! this community is amazing!

答案2

得分: 0

也许这样更好,如果你尝试理解滚动均值以及你想要它中心在哪。然后你只需实现数学:

for i in range(1, len(df)):
    df.at[i, 'Rolling Mean'] = df.loc[:i, 'Value'].tail(n_periods).mean()
  • 这比你的答案需要更少的计算资源,因为它不是从你的 DataFrame 的开头开始。
  • 窗口是右对齐的,就像你的例子一样。一般来说,可以向 df.rolling 传递 center=True,但如果你不知道未来的值,那就行不通。你应该意识到这种区别。
英文:

Maybe it's better, if you try to understand what a rolling mean is and where you want it centered. Then you just implement the math:

for i in range(1, len(df)):
    df.at[i, 'Rolling Mean'] = df.loc[:i, 'Value'].tail(n_periods).mean()
  • This needs much less computing power than your answer because it doesn't start from the beginning of your DataFrame.
  • The window is right-aligned just like in your example. In general, one can pass center=True to df.rolling but that doesn't work if you don't know the future values. You should be aware of that difference.

huangapple
  • 本文由 发表于 2023年4月19日 22:53:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76055949.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定