在pandas数据框上应用滚动函数,带有多个参数。

huangapple go评论87阅读模式
英文:

Apply rolling function on pandas dataframe with multiple arguments

问题

我正在尝试在pandas数据框上应用一个3年滚动窗口的滚动函数。

import pandas as pd
import numpy as np

# Dummy data
df = pd.DataFrame({'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'Year': [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018],
                   'IB': [2, 5, 8, 10, 7, 5, 10, 14],
                   'OB': [5, 8, 10, 12, 5, 10, 14, 20],
                   'Delta': [2, 2, 1, 3, -1, 3, 2, 4]})

# 要应用的函数
def get_ln_rate(ib, ob, delta):
    n_years = len(ib)
    return sum(delta) * np.log(ob[-1] / ib[0]) / (n_years * (ob[-1] - ib[0]))

预期输出是

      Product  Year  IB  OB  Delta  Ln_Rate
    0       A  2015   2   5      2     
    1       A  2016   5   8      2    
    2       A  2017   8  10      1   0.3353
    3       A  2018  10  12      3   0.2501
    4       B  2015   7   5     -1  
    5       B  2016   5  10      3
    6       B  2017  10  14      2   0.1320
    7       B  2018  14  20      4   0.2773

我尝试过以下代码,但不起作用。

df['Ln_Rate'] = df.groupby('Product').rolling(3).apply(lambda x: get_ln_rate(x['IB'], x['OB'], x['Delta']))

但这并没有起作用。

我找到了几个类似的帖子

https://stackoverflow.com/questions/45517686/applying-custom-rolling-function-to-dataframe - 这个没有明确的答案

https://stackoverflow.com/questions/40954560/pandas-rolling-apply-custom - 这个没有多个参数

https://stackoverflow.com/questions/30806838/apply-custom-function-on-pandas-dataframe-on-a-rolling-window - 这个有rolling.apply... 但没有显示语法。

似乎都不太对。对于正确语法的任何指点将不胜感激。

英文:

I am trying to apply a rolling function, with a 3 year window, on a pandas dataframe.

import pandas as pd

# Dummy data
df = pd.DataFrame({'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'Year': [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018],
                   'IB': [2, 5, 8, 10, 7, 5, 10, 14],
                   'OB': [5, 8, 10, 12, 5, 10, 14, 20],
                   'Delta': [2, 2, 1, 3, -1, 3, 2, 4]})

# The function to be applied
def get_ln_rate(ib, ob, delta):
    n_years = len(ib)
    return sum(delta)*np.log(ob[-1]/ib[0]) / (n_years * (ob[-1] - ib[0]))

The expected output is

  Product  Year  IB  OB  Delta  Ln_Rate
0       A  2015   2   5      2     
1       A  2016   5   8      2    
2       A  2017   8  10      1   0.3353
3       A  2018  10  12      3   0.2501
4       B  2015   7   5     -1  
5       B  2016   5  10      3
6       B  2017  10  14      2   0.1320
7       B  2018  14  20      4   0.2773

I have tried

df['Ln_Rate'] = df.groupby('Product').rolling(3).apply(lambda x: get_ln_rate(x['IB'], x['OB'], x['Delta']))

But this does not work.

I have found several similar posts

https://stackoverflow.com/questions/45517686/applying-custom-rolling-function-to-dataframe - this one does not have a clear answer

https://stackoverflow.com/questions/40954560/pandas-rolling-apply-custom - this one does not have multiple arguments

https://stackoverflow.com/questions/30806838/apply-custom-function-on-pandas-dataframe-on-a-rolling-window - this one has rolling.apply... but it doesn't show the syntax.

Neither seems to be spot on. Any pointers towards the correct syntax would be greatly appreciated.

答案1

得分: 2

我通过重新使用滚动窗口来解决了这个问题。

import numpy as np

WINDOW_SIZE = 3

rw = df.groupby('Product').rolling(WINDOW_SIZE)

df = df.assign(delta_sum=rw['Delta'].agg(np.sum).reset_index()['Delta'],
               ib_first=rw['IB'].apply(lambda xs: xs[0]).reset_index()['IB'],
               ob_last=rw['OB'].apply(lambda xs: xs[-1]).reset_index()['OB'])

df['ln_rate'] = df['delta_sum']*np.log(df['ob_last']/df['ib_first']) / (WINDOW_SIZE * (df['ob_last'] - df['ib_first']))

得到的结果如下:

  Product  Year  IB  OB  Delta  delta_sum  ib_first  ob_last   ln_rate
0       A  2015   2   5      2        NaN       NaN      NaN       NaN
1       A  2016   5   8      2        NaN       NaN      NaN       NaN
2       A  2017   8  10      1        5.0       2.0     10.0  0.335300
3       A  2018  10  12      3        6.0       5.0     12.0  0.250134
4       B  2015   7   5     -1        NaN       NaN      NaN       NaN
5       B  2016   5  10      3        NaN       NaN      NaN       NaN
6       B  2017  10  14      2        4.0       7.0     14.0  0.132028
7       B  2018  14  20      4        9.0       5.0     20.0  0.277259

重置索引是必要的,以将分组后的DataFrame转换回其初始形状。

希望对您有所帮助。

英文:

I solved this by reusing the rolling window.

import numpy as np

WINDOW_SIZE = 3

rw = df.groupby('Product').rolling(WINDOW_SIZE)

df = df.assign(delta_sum=rw['Delta'].agg(np.sum).reset_index()['Delta'],
               ib_first=rw['IB'].apply(lambda xs: xs[0]).reset_index()['IB'],
               ob_last=rw['OB'].apply(lambda xs: xs[-1]).reset_index()['OB'])

df['ln_rate'] = df['delta_sum']*np.log(df['ob_last']/df['ib_first']) / (WINDOW_SIZE * (df['ob_last'] - df['ib_first']))

Which yields:

  Product  Year  IB  OB  Delta  delta_sum  ib_first  ob_last   ln_rate
0       A  2015   2   5      2        NaN       NaN      NaN       NaN
1       A  2016   5   8      2        NaN       NaN      NaN       NaN
2       A  2017   8  10      1        5.0       2.0     10.0  0.335300
3       A  2018  10  12      3        6.0       5.0     12.0  0.250134
4       B  2015   7   5     -1        NaN       NaN      NaN       NaN
5       B  2016   5  10      3        NaN       NaN      NaN       NaN
6       B  2017  10  14      2        4.0       7.0     14.0  0.132028
7       B  2018  14  20      4        9.0       5.0     20.0  0.277259

Resetting indices is necessary, to transform the grouped DataFrame back to its initial shape.

Hope that helps.

答案2

得分: 2

另一个答案浮现在我的脑海中:在分组索引上创建滚动窗口,并将部分数据框传递给您的自定义函数。当然,该函数不会确切地使用多个参数调用,但仍然会使用所需的所有数据。

import numpy as np
import pandas as pd

df = pd.DataFrame({'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'Year': [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018],
                   'IB': [2, 5, 8, 10, 7, 5, 10, 14],
                   'OB': [5, 8, 10, 12, 5, 10, 14, 20],
                   'Delta': [2, 2, 1, 3, -1, 3, 2, 4]})

# 要应用的函数
def get_ln_rate(df):
    n_years = len(df['IB'])
    return df['Delta'].sum() * np.log(df['OB'].iloc[-1] / df['IB'].iloc[0]) / (n_years * (df['OB'].iloc[-1] - df['IB'].iloc[0]))

ln_rate = df.groupby('Product').apply(lambda grp: pd.Series(grp.index).rolling(3).agg({'Ln_Rate': lambda window: get_ln_rate(grp.loc[window])})).reset_index()['Ln_Rate']
df.assign(Ln_Rate=ln_rate)
英文:

Another answer came up my mind: Create rolling windows on the grouped indices, and pass partial dfs to your custom function. Of course, the function is not exactly called with multiple arguments, but nevertheless with all data needed.

import numpy as np
import pandas as pd

df = pd.DataFrame({'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'Year': [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018],
                   'IB': [2, 5, 8, 10, 7, 5, 10, 14],
                   'OB': [5, 8, 10, 12, 5, 10, 14, 20],
                   'Delta': [2, 2, 1, 3, -1, 3, 2, 4]})

# The function to be applied
def get_ln_rate(df):
    n_years = len(df['IB'])
    return df['Delta'].sum() * np.log(df['OB'].iloc[-1] / df['IB'].iloc[0]) / (n_years * (df['OB'].iloc[-1] - df['IB'].iloc[0]))

ln_rate = df.groupby('Product').apply(lambda grp: pd.Series(grp.index).rolling(3).agg({'Ln_Rate': lambda window: get_ln_rate(grp.loc[window])})).reset_index()['Ln_Rate']
df.assign(Ln_Rate=ln_rate)

huangapple
  • 本文由 发表于 2020年1月3日 15:48:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/59574934.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定