How can I efficiently create a new column in a pandas DataFrame based on another column's rolling mean over a period of 30 days?

huangapple go评论72阅读模式
英文:

How can I efficiently create a new column in a pandas DataFrame based on another column's rolling mean over a period of 30 days?

问题

I'm working with a pandas dataframe and working with stock data. I'm trying to create another column based off of another column while trying to avoid the slow for loop. I used a while loop to create a new column of the mean price of the past 30 days because I can't figure out another way...

VOLUME_SERIES = DF.loc[:, 'a_volume']
VOLUME_SERIES_IND = DF.loc[:, 'a_volume'].index.to_numpy()
MAX_INDEX = np.max(VOLUME_SERIES_IND)

period = 30
period = period - 1
START = 0
STOP = START + period

DF['mean_a_volume'] = np.nan

while STOP <= MAX_INDEX:
    mean_vol = np.mean(DF.loc[START:STOP, 'a_volume'])
    DF.loc[START, 'mean_a_volume'] = mean_vol
    START = START + 1
    STOP = START + period

I actually wish that I could use the .apply() method and np.where() while somehow passing in the index of the row, telling it to not apply anything if the index is greater than MAX_INDEX - period but can't seem to figure out how to get the index of the row in a series... I tried getting the indexes of the series with .apply(lambda x: print(index[x]), it looked ok for the most part but for some indexes it had two indexes so I knew something wasn't quite right and to avoid it... Really stuck here on this and don't know how else to proceed... Appreciate any help. Thanks.

英文:

I'm working with a pandas dataframe and working with stock data. I'm trying to create another column based off of another column while trying to avoid the slow for loop. I used a while loop to create a new column of the mean price of the past 30 days because I can't figure out another way...

    VOLUME_SERIES = DF.loc[:, &#39;a_volume&#39;]
    VOLUME_SERIES_IND = DF.loc[:, &#39;a_volume&#39;].index.to_numpy()
    MAX_INDEX = np.max(VOLUME_SERIES_IND)

    period = 30
    period = period - 1
    START = 0
    STOP = START + period

    DF[&#39;mean_a_volume&#39;] = np.nan

    while STOP &lt;= MAX_INDEX:
        mean_vol = np.mean(DF.loc[START:STOP, &#39;a_volume&#39;])
        DF.loc[START, &#39;mean_a_volume&#39;] = mean_vol
        START = START + 1
        STOP = START + period

I actually wish that I could use the .apply() method and np.where() while somehow passing in the index of the row, telling it to not apply anything if the index is greater than MAX_INDEX - period but can't seem to figure out how to get the index of the row in a series...I tried getting the indexes of the series with .apply(lambda x: print(index[x]), it looked ok for the most part but for some indexes it had two indexes so I knew something wasn't quite right and to avoid it...Really stuck here on this and don't know how else to proceed...Appreciate any help. Thanks.

答案1

得分: 0

你可以使用滚动。

import pandas as pd
column = [1, 2, 3, 4, 5, 6, 7, 8, 9]
df = pd.DataFrame()
df['column'] = column
df['mean'] = df['column'].rolling(2).mean()
英文:

You can use rolling.

import pandas as pd
column = [1, 2, 3, 4, 5, 6, 7, 8, 9]
df = pd.DataFrame()
df[&#39;column&#39;] = column
df[&#39;mean&#39;] = df[&#39;column&#39;].rolling(2).mean()

答案2

得分: 0

你可以使用df.rolling方法来计算滚动平均值。它是矢量化的,所以比循环更高效。这里是它的文档

这是更新后的代码:

mean_volume = DF.rolling(window=30).mean()
英文:

You can use df.rolling method to compute the rolling mean. It is vectorized so it will be a bit more efficient than looping. Here is the documentation for it.

Here is the updated code:

mean_volume = DF.rolling(window=30).mean()

huangapple
  • 本文由 发表于 2023年5月22日 23:46:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76307887.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定