英文:
Apply rolling function on pandas dataframe with multiple arguments
问题
我正在尝试在pandas数据框上应用一个3年滚动窗口的滚动函数。
import pandas as pd
import numpy as np
# Dummy data
df = pd.DataFrame({'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'Year': [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018],
'IB': [2, 5, 8, 10, 7, 5, 10, 14],
'OB': [5, 8, 10, 12, 5, 10, 14, 20],
'Delta': [2, 2, 1, 3, -1, 3, 2, 4]})
# 要应用的函数
def get_ln_rate(ib, ob, delta):
n_years = len(ib)
return sum(delta) * np.log(ob[-1] / ib[0]) / (n_years * (ob[-1] - ib[0]))
预期输出是
Product Year IB OB Delta Ln_Rate
0 A 2015 2 5 2
1 A 2016 5 8 2
2 A 2017 8 10 1 0.3353
3 A 2018 10 12 3 0.2501
4 B 2015 7 5 -1
5 B 2016 5 10 3
6 B 2017 10 14 2 0.1320
7 B 2018 14 20 4 0.2773
我尝试过以下代码,但不起作用。
df['Ln_Rate'] = df.groupby('Product').rolling(3).apply(lambda x: get_ln_rate(x['IB'], x['OB'], x['Delta']))
但这并没有起作用。
我找到了几个类似的帖子
https://stackoverflow.com/questions/45517686/applying-custom-rolling-function-to-dataframe - 这个没有明确的答案
https://stackoverflow.com/questions/40954560/pandas-rolling-apply-custom - 这个没有多个参数
https://stackoverflow.com/questions/30806838/apply-custom-function-on-pandas-dataframe-on-a-rolling-window - 这个有rolling.apply
... 但没有显示语法。
似乎都不太对。对于正确语法的任何指点将不胜感激。
英文:
I am trying to apply a rolling function, with a 3 year window, on a pandas dataframe.
import pandas as pd
# Dummy data
df = pd.DataFrame({'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'Year': [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018],
'IB': [2, 5, 8, 10, 7, 5, 10, 14],
'OB': [5, 8, 10, 12, 5, 10, 14, 20],
'Delta': [2, 2, 1, 3, -1, 3, 2, 4]})
# The function to be applied
def get_ln_rate(ib, ob, delta):
n_years = len(ib)
return sum(delta)*np.log(ob[-1]/ib[0]) / (n_years * (ob[-1] - ib[0]))
The expected output is
Product Year IB OB Delta Ln_Rate
0 A 2015 2 5 2
1 A 2016 5 8 2
2 A 2017 8 10 1 0.3353
3 A 2018 10 12 3 0.2501
4 B 2015 7 5 -1
5 B 2016 5 10 3
6 B 2017 10 14 2 0.1320
7 B 2018 14 20 4 0.2773
I have tried
df['Ln_Rate'] = df.groupby('Product').rolling(3).apply(lambda x: get_ln_rate(x['IB'], x['OB'], x['Delta']))
But this does not work.
I have found several similar posts
https://stackoverflow.com/questions/45517686/applying-custom-rolling-function-to-dataframe - this one does not have a clear answer
https://stackoverflow.com/questions/40954560/pandas-rolling-apply-custom - this one does not have multiple arguments
https://stackoverflow.com/questions/30806838/apply-custom-function-on-pandas-dataframe-on-a-rolling-window - this one has rolling.apply
... but it doesn't show the syntax.
Neither seems to be spot on. Any pointers towards the correct syntax would be greatly appreciated.
答案1
得分: 2
我通过重新使用滚动窗口来解决了这个问题。
import numpy as np
WINDOW_SIZE = 3
rw = df.groupby('Product').rolling(WINDOW_SIZE)
df = df.assign(delta_sum=rw['Delta'].agg(np.sum).reset_index()['Delta'],
ib_first=rw['IB'].apply(lambda xs: xs[0]).reset_index()['IB'],
ob_last=rw['OB'].apply(lambda xs: xs[-1]).reset_index()['OB'])
df['ln_rate'] = df['delta_sum']*np.log(df['ob_last']/df['ib_first']) / (WINDOW_SIZE * (df['ob_last'] - df['ib_first']))
得到的结果如下:
Product Year IB OB Delta delta_sum ib_first ob_last ln_rate
0 A 2015 2 5 2 NaN NaN NaN NaN
1 A 2016 5 8 2 NaN NaN NaN NaN
2 A 2017 8 10 1 5.0 2.0 10.0 0.335300
3 A 2018 10 12 3 6.0 5.0 12.0 0.250134
4 B 2015 7 5 -1 NaN NaN NaN NaN
5 B 2016 5 10 3 NaN NaN NaN NaN
6 B 2017 10 14 2 4.0 7.0 14.0 0.132028
7 B 2018 14 20 4 9.0 5.0 20.0 0.277259
重置索引是必要的,以将分组后的DataFrame转换回其初始形状。
希望对您有所帮助。
英文:
I solved this by reusing the rolling window.
import numpy as np
WINDOW_SIZE = 3
rw = df.groupby('Product').rolling(WINDOW_SIZE)
df = df.assign(delta_sum=rw['Delta'].agg(np.sum).reset_index()['Delta'],
ib_first=rw['IB'].apply(lambda xs: xs[0]).reset_index()['IB'],
ob_last=rw['OB'].apply(lambda xs: xs[-1]).reset_index()['OB'])
df['ln_rate'] = df['delta_sum']*np.log(df['ob_last']/df['ib_first']) / (WINDOW_SIZE * (df['ob_last'] - df['ib_first']))
Which yields:
Product Year IB OB Delta delta_sum ib_first ob_last ln_rate
0 A 2015 2 5 2 NaN NaN NaN NaN
1 A 2016 5 8 2 NaN NaN NaN NaN
2 A 2017 8 10 1 5.0 2.0 10.0 0.335300
3 A 2018 10 12 3 6.0 5.0 12.0 0.250134
4 B 2015 7 5 -1 NaN NaN NaN NaN
5 B 2016 5 10 3 NaN NaN NaN NaN
6 B 2017 10 14 2 4.0 7.0 14.0 0.132028
7 B 2018 14 20 4 9.0 5.0 20.0 0.277259
Resetting indices is necessary, to transform the grouped DataFrame back to its initial shape.
Hope that helps.
答案2
得分: 2
另一个答案浮现在我的脑海中:在分组索引上创建滚动窗口,并将部分数据框传递给您的自定义函数。当然,该函数不会确切地使用多个参数调用,但仍然会使用所需的所有数据。
import numpy as np
import pandas as pd
df = pd.DataFrame({'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'Year': [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018],
'IB': [2, 5, 8, 10, 7, 5, 10, 14],
'OB': [5, 8, 10, 12, 5, 10, 14, 20],
'Delta': [2, 2, 1, 3, -1, 3, 2, 4]})
# 要应用的函数
def get_ln_rate(df):
n_years = len(df['IB'])
return df['Delta'].sum() * np.log(df['OB'].iloc[-1] / df['IB'].iloc[0]) / (n_years * (df['OB'].iloc[-1] - df['IB'].iloc[0]))
ln_rate = df.groupby('Product').apply(lambda grp: pd.Series(grp.index).rolling(3).agg({'Ln_Rate': lambda window: get_ln_rate(grp.loc[window])})).reset_index()['Ln_Rate']
df.assign(Ln_Rate=ln_rate)
英文:
Another answer came up my mind: Create rolling windows on the grouped indices, and pass partial dfs to your custom function. Of course, the function is not exactly called with multiple arguments, but nevertheless with all data needed.
import numpy as np
import pandas as pd
df = pd.DataFrame({'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'Year': [2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018],
'IB': [2, 5, 8, 10, 7, 5, 10, 14],
'OB': [5, 8, 10, 12, 5, 10, 14, 20],
'Delta': [2, 2, 1, 3, -1, 3, 2, 4]})
# The function to be applied
def get_ln_rate(df):
n_years = len(df['IB'])
return df['Delta'].sum() * np.log(df['OB'].iloc[-1] / df['IB'].iloc[0]) / (n_years * (df['OB'].iloc[-1] - df['IB'].iloc[0]))
ln_rate = df.groupby('Product').apply(lambda grp: pd.Series(grp.index).rolling(3).agg({'Ln_Rate': lambda window: get_ln_rate(grp.loc[window])})).reset_index()['Ln_Rate']
df.assign(Ln_Rate=ln_rate)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论