英文:
How can I efficiently create a new column in a pandas DataFrame based on another column's rolling mean over a period of 30 days?
问题
I'm working with a pandas dataframe and working with stock data. I'm trying to create another column based off of another column while trying to avoid the slow for loop. I used a while loop to create a new column of the mean price of the past 30 days because I can't figure out another way...
VOLUME_SERIES = DF.loc[:, 'a_volume']
VOLUME_SERIES_IND = DF.loc[:, 'a_volume'].index.to_numpy()
MAX_INDEX = np.max(VOLUME_SERIES_IND)
period = 30
period = period - 1
START = 0
STOP = START + period
DF['mean_a_volume'] = np.nan
while STOP <= MAX_INDEX:
mean_vol = np.mean(DF.loc[START:STOP, 'a_volume'])
DF.loc[START, 'mean_a_volume'] = mean_vol
START = START + 1
STOP = START + period
I actually wish that I could use the .apply() method and np.where() while somehow passing in the index of the row, telling it to not apply anything if the index is greater than MAX_INDEX - period but can't seem to figure out how to get the index of the row in a series... I tried getting the indexes of the series with .apply(lambda x: print(index[x]), it looked ok for the most part but for some indexes it had two indexes so I knew something wasn't quite right and to avoid it... Really stuck here on this and don't know how else to proceed... Appreciate any help. Thanks.
英文:
I'm working with a pandas dataframe and working with stock data. I'm trying to create another column based off of another column while trying to avoid the slow for loop. I used a while loop to create a new column of the mean price of the past 30 days because I can't figure out another way...
VOLUME_SERIES = DF.loc[:, 'a_volume']
VOLUME_SERIES_IND = DF.loc[:, 'a_volume'].index.to_numpy()
MAX_INDEX = np.max(VOLUME_SERIES_IND)
period = 30
period = period - 1
START = 0
STOP = START + period
DF['mean_a_volume'] = np.nan
while STOP <= MAX_INDEX:
mean_vol = np.mean(DF.loc[START:STOP, 'a_volume'])
DF.loc[START, 'mean_a_volume'] = mean_vol
START = START + 1
STOP = START + period
I actually wish that I could use the .apply() method and np.where() while somehow passing in the index of the row, telling it to not apply anything if the index is greater than MAX_INDEX - period but can't seem to figure out how to get the index of the row in a series...I tried getting the indexes of the series with .apply(lambda x: print(index[x]), it looked ok for the most part but for some indexes it had two indexes so I knew something wasn't quite right and to avoid it...Really stuck here on this and don't know how else to proceed...Appreciate any help. Thanks.
答案1
得分: 0
你可以使用滚动。
import pandas as pd
column = [1, 2, 3, 4, 5, 6, 7, 8, 9]
df = pd.DataFrame()
df['column'] = column
df['mean'] = df['column'].rolling(2).mean()
英文:
You can use rolling.
import pandas as pd
column = [1, 2, 3, 4, 5, 6, 7, 8, 9]
df = pd.DataFrame()
df['column'] = column
df['mean'] = df['column'].rolling(2).mean()
答案2
得分: 0
你可以使用df.rolling方法来计算滚动平均值。它是矢量化的,所以比循环更高效。这里是它的文档。
这是更新后的代码:
mean_volume = DF.rolling(window=30).mean()
英文:
You can use df.rolling method to compute the rolling mean. It is vectorized so it will be a bit more efficient than looping. Here is the documentation for it.
Here is the updated code:
mean_volume = DF.rolling(window=30).mean()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论