在Pandas中基于两列计算滚动平均值

huangapple go评论76阅读模式
英文:

Rolling average base on two columns in Pandas

问题

我想要根据周来获得每个月的滚动总和。以下是我的数据框(df)的样子:

我想要根据每个颜色和月份获取每个月的滚动总和,基于该月的周。对于红色和二月份,第一周是30,第二周是50(30+20)。对于三月份的红色,它应该是15, 25, 45, 65。

我尝试使用分组和滚动窗口,包括使用lambda函数,但没有成功。

任何帮助将不胜感激。提前感谢!

以下是要获取的结果:

number  color   mon     0  rolling
30      red     Feb     1       30
20      red     Feb     2       50
15      red     Mar     1       15
10      red     Mar     2       25
20      red     Mar     3       45
20      red     Mar     4       65
5       blue    Feb     1        5
15      blue    Feb     2       20
20      blue    Mar     1       20
10      blue    Mar     2       30
10      blue    Mar     3       40
30      blue    Mar     4       70
20      Green   Feb     1       20
15      Green   Mar     1       15
10      Green   Mar     2       25

希望这能帮助你。

英文:

I want to get a rolling sum for each month base on weeks.
Here is what my df looks like

df = pd.DataFrame([1, 2, 1, 2, 3, 4, 1, 2, 1, 2, 3, 4, 1, 1, 2], index=pd.MultiIndex.from_arrays([[ 30, 20,15, 10, 20, 20,5,15,20,10,10, 30, 20,15, 10], ['red','red','red','red', 'red', 'red', 'blue', 'blue', 'blue', 'blue', 'blue', 'blue','Green', 'Green', 'Green'],['Feb', 'Feb', 'Mar', 'Mar', 'Mar', 'Mar', 'Feb', 'Feb', 'Mar', 'Mar', 'Mar', 'Mar', 'Feb', 'Mar', 'Mar']], names=['number', 'color', 'Mon'])).reset_index()
df
number	color	mon		0 
30		red		Feb		1
20		red		Feb		2
15		red		Mar		1
10		red		Mar		2
20		red		Mar		3
20		red		Mar		4
5		blue	Feb		1
15		blue	Feb		2
20		blue	Mar		1
10		blue	Mar		2
10		blue	Mar		3
30		blue	Mar		4
20		Green	Feb		1
15		Green	Mar		1
10		Green	Mar		2

I want to get the rolling sum for each month for each color base on that month's week.
For the color red and mon Feb 1st week 30 and 2nd week 50 (30+20). For-Mar it should be for red 15,25,45,65

number	color	mon		0 	rolling
30		red		Feb		1	30
20		red		Feb		2	50
15		red		Mar		1	15
10		red		Mar		2	25
20		red		Mar		3	45
20		red		Mar		4	65
5		blue	Feb		1	5
15		blue	Feb		2	20
20		blue	Mar		1	20
10		blue	Mar		2	30
10		blue	Mar		3	40
30		blue	Mar		4	70
20		Green	Feb		1	20
15		Green	Mar		1	15
10		Green	Mar		2	25

I'm trying to use group by with rolling window with and without lambda but it didn't work out

df.groupby(by=['color','Mon']).rolling(window=2).sum()
# also below command
df.groupby(by=['color','Mon']).apply(lambda x: x.rolling(2).sum())

Any help would be greatly appreciated.
Thanks in advance!!

答案1

得分: 1

我认为你想要使用 .cumsum() 函数:

df['rolling'] = df.groupby(['color', 'Mon'])['number'].cumsum()
print(df)

打印结果:

    number  color  Mon  0  rolling
0       30    red  Feb  1       30
1       20    red  Feb  2       50
2       15    red  Mar  1       15
3       10    red  Mar  2       25
4       20    red  Mar  3       45
5       20    red  Mar  4       65
6        5   blue  Feb  1        5
7       15   blue  Feb  2       20
8       20   blue  Mar  1       20
9       10   blue  Mar  2       30
10      10   blue  Mar  3       40
11      30   blue  Mar  4       70
12      20  Green  Feb  1       20
13      15  Green  Mar  1       15
14      10  Green  Mar  2       25
英文:

I think you want .cumsum():

df['rolling'] = df.groupby(['color', 'Mon'])['number'].cumsum()
print(df)

Prints:

    number  color  Mon  0  rolling
0       30    red  Feb  1       30
1       20    red  Feb  2       50
2       15    red  Mar  1       15
3       10    red  Mar  2       25
4       20    red  Mar  3       45
5       20    red  Mar  4       65
6        5   blue  Feb  1        5
7       15   blue  Feb  2       20
8       20   blue  Mar  1       20
9       10   blue  Mar  2       30
10      10   blue  Mar  3       40
11      30   blue  Mar  4       70
12      20  Green  Feb  1       20
13      15  Green  Mar  1       15
14      10  Green  Mar  2       25

huangapple
  • 本文由 发表于 2023年2月24日 06:15:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/75550869.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定