2023年5月22日 23:46:48go评论72阅读模式

英文:

How can I efficiently create a new column in a pandas DataFrame based on another column's rolling mean over a period of 30 days?

问题

I'm working with a pandas dataframe and working with stock data. I'm trying to create another column based off of another column while trying to avoid the slow for loop. I used a while loop to create a new column of the mean price of the past 30 days because I can't figure out another way...

VOLUME_SERIES = DF.loc[:, 'a_volume']
VOLUME_SERIES_IND = DF.loc[:, 'a_volume'].index.to_numpy()
MAX_INDEX = np.max(VOLUME_SERIES_IND)

period = 30
period = period - 1
START = 0
STOP = START + period

DF['mean_a_volume'] = np.nan

while STOP <= MAX_INDEX:
    mean_vol = np.mean(DF.loc[START:STOP, 'a_volume'])
    DF.loc[START, 'mean_a_volume'] = mean_vol
    START = START + 1
    STOP = START + period

I actually wish that I could use the .apply() method and np.where() while somehow passing in the index of the row, telling it to not apply anything if the index is greater than MAX_INDEX - period but can't seem to figure out how to get the index of the row in a series... I tried getting the indexes of the series with .apply(lambda x: print(index[x]), it looked ok for the most part but for some indexes it had two indexes so I knew something wasn't quite right and to avoid it... Really stuck here on this and don't know how else to proceed... Appreciate any help. Thanks.

英文:

    VOLUME_SERIES = DF.loc[:, &#39;a_volume&#39;]
    VOLUME_SERIES_IND = DF.loc[:, &#39;a_volume&#39;].index.to_numpy()
    MAX_INDEX = np.max(VOLUME_SERIES_IND)

    period = 30
    period = period - 1
    START = 0
    STOP = START + period

    DF[&#39;mean_a_volume&#39;] = np.nan

    while STOP &lt;= MAX_INDEX:
        mean_vol = np.mean(DF.loc[START:STOP, &#39;a_volume&#39;])
        DF.loc[START, &#39;mean_a_volume&#39;] = mean_vol
        START = START + 1
        STOP = START + period

I actually wish that I could use the .apply() method and np.where() while somehow passing in the index of the row, telling it to not apply anything if the index is greater than MAX_INDEX - period but can't seem to figure out how to get the index of the row in a series...I tried getting the indexes of the series with .apply(lambda x: print(index[x]), it looked ok for the most part but for some indexes it had two indexes so I knew something wasn't quite right and to avoid it...Really stuck here on this and don't know how else to proceed...Appreciate any help. Thanks.

答案1

得分: 0

你可以使用滚动。

import pandas as pd
column = [1, 2, 3, 4, 5, 6, 7, 8, 9]
df = pd.DataFrame()
df['column'] = column
df['mean'] = df['column'].rolling(2).mean()

英文:

You can use rolling.

import pandas as pd
column = [1, 2, 3, 4, 5, 6, 7, 8, 9]
df = pd.DataFrame()
df[&#39;column&#39;] = column
df[&#39;mean&#39;] = df[&#39;column&#39;].rolling(2).mean()

答案2

得分: 0

你可以使用df.rolling方法来计算滚动平均值。它是矢量化的，所以比循环更高效。这里是它的文档。

这是更新后的代码：

mean_volume = DF.rolling(window=30).mean()

英文:

You can use df.rolling method to compute the rolling mean. It is vectorized so it will be a bit more efficient than looping. Here is the documentation for it.

Here is the updated code:

mean_volume = DF.rolling(window=30).mean()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How can I efficiently create a new column in a pandas DataFrame based on another column's rolling mean over a period of 30 days?

问题

答案1

答案2

每次我尝试下载discord.py都不起作用。

PyFMI在Ubuntu 18.04中的Python 3环境中。

Pytest-xdist: 所有工作进程完成后的 tearDown

为什么NumPy中的full函数不能使用dtype=str？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论