2020年1月4日 00:59:54go评论82阅读模式

英文:

Calculate column in Pandas Dataframe using adjacent rows without iterating through each row

问题

以下是代码部分的翻译：

def create_candles(ticks, instrument, time_slice):
    candlesticks = ticks.price.resample(time_slice, base=00).ohlc().bfill()
    volume = ticks.amount.resample(time_slice, base=00).sum()
    candlesticks['volume'] = volume
    candlesticks['instrument'] = instrument
    candlesticks['ttr'] = 0
    # candlesticks['vr_7'] = 0
    candlesticks['vr_10'] = 0
    candlesticks = calculate_indicators(candlesticks, instrument, time_slice)

    return candlesticks


def calculate_indicators(candlesticks, instrument):
    candlesticks.sort_index(inplace=True)
    # candlesticks['rsi_14'] = talib.RSI(candlesticks.close, timeperiod=14)
    candlesticks['lr_50'] = talib.LINEARREG(candlesticks.close, timeperiod=50)
    # candlesticks['lr_150'] = talib.LINEARREG(candlesticks.close, timeperiod=150)
    # candlesticks['ema_55'] = talib.EMA(candlesticks.close, timeperiod=55)
    # candlesticks['ema_28'] = talib.EMA(candlesticks.close, timeperiod=28)
    # candlesticks['ema_18'] = talib.EMA(candlesticks.close, timeperiod=18)
    # candlesticks['ema_9'] = talib.EMA(candlesticks.close, timeperiod=9)
    # candlesticks['wma_21'] = talib.WMA(candlesticks.close, timeperiod=21)
    # candlesticks['wma_12'] = talib.WMA(candlesticks.close, timeperiod=12)
    # candlesticks['wma_11'] = talib.WMA(candlesticks.close, timeperiod=11)
    # candlesticks['wma_5'] = talib.WMA(candlesticks.close, timeperiod=5)
    candlesticks['cmo_9'] = talib.CMO(candlesticks.close, timeperiod=9)

    for row in candlesticks.itertuples():
        current_index = candlesticks.index.get_loc(row.Index)
        if current_index >= 1:
            previous_close = candlesticks.iloc[current_index - 1, candlesticks.columns.get_loc('close')]
            candlesticks.iloc[current_index, candlesticks.columns.get_loc('ttr')] = max(
                row.high - row.low,
                abs(row.high - previous_close),
                abs(row.low - previous_close))

        if current_index >= 10:
            candlesticks.iloc[current_index, candlesticks.columns.get_loc('vr_10')] = candlesticks.iloc[current_index, candlesticks.columns.get_loc('ttr')] / (
                max(candlesticks.high[current_index - 9: current_index].max(), candlesticks.close[current_index - 11]) -
                min(candlesticks.low[current_index - 9: current_index].min(), candlesticks.close[current_index - 11]))

    candlesticks['timestamp'] = pd.to_datetime(candlesticks.index)
    candlesticks['instrument'] = instrument
    candlesticks.fillna(0, inplace=True)
    return candlesticks

请注意，这只是代码部分的翻译。如果需要其他帮助或翻译其他部分，请提出具体的问题。

英文:

I would like to see if there is a way to calculate a column in a dataframe that uses something similar to a moving average without iterating through each row.
Current working code:

def create_candles(ticks, instrument, time_slice):
candlesticks = ticks.price.resample(time_slice, base=00).ohlc().bfill()
volume = ticks.amount.resample(time_slice, base=00).sum()
candlesticks[&#39;volume&#39;] = volume
candlesticks[&#39;instrument&#39;] = instrument
candlesticks[&#39;ttr&#39;] = 0
# candlesticks[&#39;vr_7&#39;] = 0
candlesticks[&#39;vr_10&#39;] = 0
candlesticks = calculate_indicators(candlesticks, instrument, time_slice)
return candlesticks
def calculate_indicators(candlesticks, instrument):
candlesticks.sort_index(inplace=True)
# candlesticks[&#39;rsi_14&#39;] = talib.RSI(candlesticks.close, timeperiod=14)
candlesticks[&#39;lr_50&#39;] = talib.LINEARREG(candlesticks.close, timeperiod=50)
# candlesticks[&#39;lr_150&#39;] = talib.LINEARREG(candlesticks.close, timeperiod=150)
# candlesticks[&#39;ema_55&#39;] = talib.EMA(candlesticks.close, timeperiod=55)
# candlesticks[&#39;ema_28&#39;] = talib.EMA(candlesticks.close, timeperiod=28)
# candlesticks[&#39;ema_18&#39;] = talib.EMA(candlesticks.close, timeperiod=18)
# candlesticks[&#39;ema_9&#39;] = talib.EMA(candlesticks.close, timeperiod=9)
# candlesticks[&#39;wma_21&#39;] = talib.WMA(candlesticks.close, timeperiod=21)
# candlesticks[&#39;wma_12&#39;] = talib.WMA(candlesticks.close, timeperiod=12)
# candlesticks[&#39;wma_11&#39;] = talib.WMA(candlesticks.close, timeperiod=11)
# candlesticks[&#39;wma_5&#39;] = talib.WMA(candlesticks.close, timeperiod=5)
candlesticks[&#39;cmo_9&#39;] = talib.CMO(candlesticks.close, timeperiod=9)
for row in candlesticks.itertuples():
current_index = candlesticks.index.get_loc(row.Index)
if current_index &gt;= 1:
previous_close = candlesticks.iloc[current_index - 1, candlesticks.columns.get_loc(&#39;close&#39;)]
candlesticks.iloc[current_index, candlesticks.columns.get_loc(&#39;ttr&#39;)] = max(
row.high - row.low,
abs(row.high - previous_close),
abs(row.low - previous_close))
if current_index &gt; 10:
candlesticks.iloc[current_index, candlesticks.columns.get_loc(&#39;vr_10&#39;)] = candlesticks.iloc[current_index, candlesticks.columns.get_loc(&#39;ttr&#39;)] / (
max(candlesticks.high[current_index - 9: current_index].max(), candlesticks.close[current_index - 11]) -
min(candlesticks.low[current_index - 9: current_index].min(), candlesticks.close[current_index - 11]))
candlesticks[&#39;timestamp&#39;] = pd.to_datetime(candlesticks.index)
candlesticks[&#39;instrument&#39;] = instrument
candlesticks.fillna(0, inplace=True)
return candlesticks

in the iteration, i am calculating the True Range ('TTR') and then the Volatility Ratio ('VR_10')

TTR is calculated on every row in the DF except for the first one. It uses the previous row's close column, and the current row's high and low column.

VR_10 is calculated on every row except for the first 10. it uses the high and low column of the previous 9 rows and the close of the 10th row back.

EDIT 2
I have tried many ways to add a text based data frame in this question, there just doesnt seem to be a solution with the width of my frame. there is no difference in the input and output dataframes other than the column TTR and VR_10 have all 0s in the input, and have non-zero values in the output.
an example would be this dataframe:

Is there a way I can do this without iteration?

答案1

得分: 0

在Andreas的建议下使用rolling，我得出了以下答案：
首先，我需要找出如何在多列上使用rolling。我在这里找到了答案。
我进行了修改，因为我需要向上滚动，而不是向下滚动。

def roll(df, w, **kwargs):
    df.sort_values(by='timestamp', ascending=0, inplace=True)
    v = df.values
    d0, d1 = v.shape
    s0, s1 = v.strides

    a = stride(v, (d0 - (w - 1), w, d1), (s0, s0, s1))

    rolled_df = pd.concat({
        row: pd.DataFrame(values, columns=df.columns)
        for row, values in zip(df.index, a)
    })

    return rolled_df.groupby(level=0, **kwargs)

之后，我创建了两个函数：

def calculate_vr(window):
    return window.iloc[0].ttr / (max(window.high[1:9].max(), window.iloc[10].close) - min(window.low[1:9].min(), window.iloc[10].close))


def calculate_ttr(window):
    return max(window.iloc[0].high - window.iloc[0].low, abs(window.iloc[0].high - window.iloc[1].close), abs(window.iloc[0].low - window.iloc[1].close))

然后像这样调用这些函数：

candlesticks['ttr'] = roll(candlesticks, 3).apply(calculate_ttr)
candlesticks['vr_10'] = roll(candlesticks, 11).apply(calculate_vr)

在两种方法都添加了定时器，这种方式大约比迭代慢3倍。

英文:

With the nudge from Andreas to use rolling, I came to an answer:
first, I had to find out how to use rolling with multiple columns. found that here.
I made a modification because I need to roll up, not down

def roll(df, w, **kwargs):
df.sort_values(by=&#39;timestamp&#39;, ascending=0, inplace=True)
v = df.values
d0, d1 = v.shape
s0, s1 = v.strides
a = stride(v, (d0 - (w - 1), w, d1), (s0, s0, s1))
rolled_df = pd.concat({
row: pd.DataFrame(values, columns=df.columns)
for row, values in zip(df.index, a)
})
return rolled_df.groupby(level=0, **kwargs)

after that, I created 2 functions:

def calculate_vr(window):
return window.iloc[0].ttr / (max(window.high[1:9].max(), window.iloc[10].close) - min(window.low[1:9].min(), window.iloc[10].close))
def calculate_ttr(window):
return max(window.iloc[0].high - window.iloc[0].low, abs(window.iloc[0].high - window.iloc[1].close), abs(window.iloc[0].low - window.iloc[1].close))

and called those functions like this:

    candlesticks[&#39;ttr&#39;] = roll(candlesticks, 3).apply(calculate_ttr)
candlesticks[&#39;vr_10&#39;] = roll(candlesticks, 11).apply(calculate_vr)

added timers to both ways and this way is roughly 3X slower than iteration.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用 Pandas 数据框计算相邻行而无需遍历每一行。

问题

答案1

从DRF中的APIClient()获取用户

BeautifulSoup的previous_sibling不起作用。

使用请求混合表单和文件，带注释和可选字段。

Python Pandas 将按月聚合的数据重新采样为按日，然后再次聚合为按周。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论