2020年1月3日 15:48:18go评论75阅读模式

英文:

Apply rolling function on pandas dataframe with multiple arguments

问题

我正在尝试在pandas数据框上应用一个3年滚动窗口的滚动函数。

import pandas as pd
import numpy as np

# Dummy data
df = pd.DataFrame({'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'Year': [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018],
                   'IB': [2, 5, 8, 10, 7, 5, 10, 14],
                   'OB': [5, 8, 10, 12, 5, 10, 14, 20],
                   'Delta': [2, 2, 1, 3, -1, 3, 2, 4]})

# 要应用的函数
def get_ln_rate(ib, ob, delta):
    n_years = len(ib)
    return sum(delta) * np.log(ob[-1] / ib[0]) / (n_years * (ob[-1] - ib[0]))

预期输出是

      Product  Year  IB  OB  Delta  Ln_Rate
    0       A  2015   2   5      2     
    1       A  2016   5   8      2    
    2       A  2017   8  10      1   0.3353
    3       A  2018  10  12      3   0.2501
    4       B  2015   7   5     -1  
    5       B  2016   5  10      3
    6       B  2017  10  14      2   0.1320
    7       B  2018  14  20      4   0.2773

我尝试过以下代码，但不起作用。

df['Ln_Rate'] = df.groupby('Product').rolling(3).apply(lambda x: get_ln_rate(x['IB'], x['OB'], x['Delta']))

但这并没有起作用。

我找到了几个类似的帖子

https://stackoverflow.com/questions/45517686/applying-custom-rolling-function-to-dataframe - 这个没有明确的答案

https://stackoverflow.com/questions/40954560/pandas-rolling-apply-custom - 这个没有多个参数

https://stackoverflow.com/questions/30806838/apply-custom-function-on-pandas-dataframe-on-a-rolling-window - 这个有rolling.apply... 但没有显示语法。

似乎都不太对。对于正确语法的任何指点将不胜感激。

英文:

I am trying to apply a rolling function, with a 3 year window, on a pandas dataframe.

import pandas as pd

# Dummy data
df = pd.DataFrame({&#39;Product&#39;: [&#39;A&#39;, &#39;A&#39;, &#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;B&#39;, &#39;B&#39;, &#39;B&#39;],
                   &#39;Year&#39;: [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018],
                   &#39;IB&#39;: [2, 5, 8, 10, 7, 5, 10, 14],
                   &#39;OB&#39;: [5, 8, 10, 12, 5, 10, 14, 20],
                   &#39;Delta&#39;: [2, 2, 1, 3, -1, 3, 2, 4]})

# The function to be applied
def get_ln_rate(ib, ob, delta):
    n_years = len(ib)
    return sum(delta)*np.log(ob[-1]/ib[0]) / (n_years * (ob[-1] - ib[0]))

The expected output is

  Product  Year  IB  OB  Delta  Ln_Rate
0       A  2015   2   5      2     
1       A  2016   5   8      2    
2       A  2017   8  10      1   0.3353
3       A  2018  10  12      3   0.2501
4       B  2015   7   5     -1  
5       B  2016   5  10      3
6       B  2017  10  14      2   0.1320
7       B  2018  14  20      4   0.2773

I have tried

df[&#39;Ln_Rate&#39;] = df.groupby(&#39;Product&#39;).rolling(3).apply(lambda x: get_ln_rate(x[&#39;IB&#39;], x[&#39;OB&#39;], x[&#39;Delta&#39;]))

But this does not work.

I have found several similar posts

https://stackoverflow.com/questions/45517686/applying-custom-rolling-function-to-dataframe - this one does not have a clear answer

https://stackoverflow.com/questions/40954560/pandas-rolling-apply-custom - this one does not have multiple arguments

https://stackoverflow.com/questions/30806838/apply-custom-function-on-pandas-dataframe-on-a-rolling-window - this one has rolling.apply... but it doesn't show the syntax.

Neither seems to be spot on. Any pointers towards the correct syntax would be greatly appreciated.

答案1

得分: 2

我通过重新使用滚动窗口来解决了这个问题。

import numpy as np

WINDOW_SIZE = 3

rw = df.groupby('Product').rolling(WINDOW_SIZE)

df = df.assign(delta_sum=rw['Delta'].agg(np.sum).reset_index()['Delta'],
               ib_first=rw['IB'].apply(lambda xs: xs[0]).reset_index()['IB'],
               ob_last=rw['OB'].apply(lambda xs: xs[-1]).reset_index()['OB'])

df['ln_rate'] = df['delta_sum']*np.log(df['ob_last']/df['ib_first']) / (WINDOW_SIZE * (df['ob_last'] - df['ib_first']))

得到的结果如下：

  Product  Year  IB  OB  Delta  delta_sum  ib_first  ob_last   ln_rate
0       A  2015   2   5      2        NaN       NaN      NaN       NaN
1       A  2016   5   8      2        NaN       NaN      NaN       NaN
2       A  2017   8  10      1        5.0       2.0     10.0  0.335300
3       A  2018  10  12      3        6.0       5.0     12.0  0.250134
4       B  2015   7   5     -1        NaN       NaN      NaN       NaN
5       B  2016   5  10      3        NaN       NaN      NaN       NaN
6       B  2017  10  14      2        4.0       7.0     14.0  0.132028
7       B  2018  14  20      4        9.0       5.0     20.0  0.277259

重置索引是必要的，以将分组后的DataFrame转换回其初始形状。

希望对您有所帮助。

英文:

I solved this by reusing the rolling window.

import numpy as np

WINDOW_SIZE = 3

rw = df.groupby(&#39;Product&#39;).rolling(WINDOW_SIZE)

df = df.assign(delta_sum=rw[&#39;Delta&#39;].agg(np.sum).reset_index()[&#39;Delta&#39;],
               ib_first=rw[&#39;IB&#39;].apply(lambda xs: xs[0]).reset_index()[&#39;IB&#39;],
               ob_last=rw[&#39;OB&#39;].apply(lambda xs: xs[-1]).reset_index()[&#39;OB&#39;])

df[&#39;ln_rate&#39;] = df[&#39;delta_sum&#39;]*np.log(df[&#39;ob_last&#39;]/df[&#39;ib_first&#39;]) / (WINDOW_SIZE * (df[&#39;ob_last&#39;] - df[&#39;ib_first&#39;]))

Which yields:

  Product  Year  IB  OB  Delta  delta_sum  ib_first  ob_last   ln_rate
0       A  2015   2   5      2        NaN       NaN      NaN       NaN
1       A  2016   5   8      2        NaN       NaN      NaN       NaN
2       A  2017   8  10      1        5.0       2.0     10.0  0.335300
3       A  2018  10  12      3        6.0       5.0     12.0  0.250134
4       B  2015   7   5     -1        NaN       NaN      NaN       NaN
5       B  2016   5  10      3        NaN       NaN      NaN       NaN
6       B  2017  10  14      2        4.0       7.0     14.0  0.132028
7       B  2018  14  20      4        9.0       5.0     20.0  0.277259

Resetting indices is necessary, to transform the grouped DataFrame back to its initial shape.

Hope that helps.

答案2

得分: 2

另一个答案浮现在我的脑海中：在分组索引上创建滚动窗口，并将部分数据框传递给您的自定义函数。当然，该函数不会确切地使用多个参数调用，但仍然会使用所需的所有数据。

import numpy as np
import pandas as pd

df = pd.DataFrame({'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'Year': [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018],
                   'IB': [2, 5, 8, 10, 7, 5, 10, 14],
                   'OB': [5, 8, 10, 12, 5, 10, 14, 20],
                   'Delta': [2, 2, 1, 3, -1, 3, 2, 4]})

# 要应用的函数
def get_ln_rate(df):
    n_years = len(df['IB'])
    return df['Delta'].sum() * np.log(df['OB'].iloc[-1] / df['IB'].iloc[0]) / (n_years * (df['OB'].iloc[-1] - df['IB'].iloc[0]))

ln_rate = df.groupby('Product').apply(lambda grp: pd.Series(grp.index).rolling(3).agg({'Ln_Rate': lambda window: get_ln_rate(grp.loc[window])})).reset_index()['Ln_Rate']
df.assign(Ln_Rate=ln_rate)

英文:

Another answer came up my mind: Create rolling windows on the grouped indices, and pass partial dfs to your custom function. Of course, the function is not exactly called with multiple arguments, but nevertheless with all data needed.

import numpy as np
import pandas as pd

df = pd.DataFrame({&#39;Product&#39;: [&#39;A&#39;, &#39;A&#39;, &#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;B&#39;, &#39;B&#39;, &#39;B&#39;],
                   &#39;Year&#39;: [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018],
                   &#39;IB&#39;: [2, 5, 8, 10, 7, 5, 10, 14],
                   &#39;OB&#39;: [5, 8, 10, 12, 5, 10, 14, 20],
                   &#39;Delta&#39;: [2, 2, 1, 3, -1, 3, 2, 4]})

# The function to be applied
def get_ln_rate(df):
    n_years = len(df[&#39;IB&#39;])
    return df[&#39;Delta&#39;].sum() * np.log(df[&#39;OB&#39;].iloc[-1] / df[&#39;IB&#39;].iloc[0]) / (n_years * (df[&#39;OB&#39;].iloc[-1] - df[&#39;IB&#39;].iloc[0]))

ln_rate = df.groupby(&#39;Product&#39;).apply(lambda grp: pd.Series(grp.index).rolling(3).agg({&#39;Ln_Rate&#39;: lambda window: get_ln_rate(grp.loc[window])})).reset_index()[&#39;Ln_Rate&#39;]
df.assign(Ln_Rate=ln_rate)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在pandas数据框上应用滚动函数，带有多个参数。

问题

答案1

答案2

运行一个带有额外参数的Python程序。

使用PyTorch进行回归任务时如何使用conv1d？

如何在Python中将作业发送到网络打印机

如何循环执行if语句？ (Python 3.8)

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论