在Pandas中基于两列计算滚动平均值

huangapple go评论98阅读模式
英文:

Rolling average base on two columns in Pandas

问题

我想要根据周来获得每个月的滚动总和。以下是我的数据框(df)的样子:

我想要根据每个颜色和月份获取每个月的滚动总和,基于该月的周。对于红色和二月份,第一周是30,第二周是50(30+20)。对于三月份的红色,它应该是15, 25, 45, 65。

我尝试使用分组和滚动窗口,包括使用lambda函数,但没有成功。

任何帮助将不胜感激。提前感谢!

以下是要获取的结果:

  1. number color mon 0 rolling
  2. 30 red Feb 1 30
  3. 20 red Feb 2 50
  4. 15 red Mar 1 15
  5. 10 red Mar 2 25
  6. 20 red Mar 3 45
  7. 20 red Mar 4 65
  8. 5 blue Feb 1 5
  9. 15 blue Feb 2 20
  10. 20 blue Mar 1 20
  11. 10 blue Mar 2 30
  12. 10 blue Mar 3 40
  13. 30 blue Mar 4 70
  14. 20 Green Feb 1 20
  15. 15 Green Mar 1 15
  16. 10 Green Mar 2 25

希望这能帮助你。

英文:

I want to get a rolling sum for each month base on weeks.
Here is what my df looks like

  1. df = pd.DataFrame([1, 2, 1, 2, 3, 4, 1, 2, 1, 2, 3, 4, 1, 1, 2], index=pd.MultiIndex.from_arrays([[ 30, 20,15, 10, 20, 20,5,15,20,10,10, 30, 20,15, 10], ['red','red','red','red', 'red', 'red', 'blue', 'blue', 'blue', 'blue', 'blue', 'blue','Green', 'Green', 'Green'],['Feb', 'Feb', 'Mar', 'Mar', 'Mar', 'Mar', 'Feb', 'Feb', 'Mar', 'Mar', 'Mar', 'Mar', 'Feb', 'Mar', 'Mar']], names=['number', 'color', 'Mon'])).reset_index()
  2. df
  3. number color mon 0
  4. 30 red Feb 1
  5. 20 red Feb 2
  6. 15 red Mar 1
  7. 10 red Mar 2
  8. 20 red Mar 3
  9. 20 red Mar 4
  10. 5 blue Feb 1
  11. 15 blue Feb 2
  12. 20 blue Mar 1
  13. 10 blue Mar 2
  14. 10 blue Mar 3
  15. 30 blue Mar 4
  16. 20 Green Feb 1
  17. 15 Green Mar 1
  18. 10 Green Mar 2

I want to get the rolling sum for each month for each color base on that month's week.
For the color red and mon Feb 1st week 30 and 2nd week 50 (30+20). For-Mar it should be for red 15,25,45,65

  1. number color mon 0 rolling
  2. 30 red Feb 1 30
  3. 20 red Feb 2 50
  4. 15 red Mar 1 15
  5. 10 red Mar 2 25
  6. 20 red Mar 3 45
  7. 20 red Mar 4 65
  8. 5 blue Feb 1 5
  9. 15 blue Feb 2 20
  10. 20 blue Mar 1 20
  11. 10 blue Mar 2 30
  12. 10 blue Mar 3 40
  13. 30 blue Mar 4 70
  14. 20 Green Feb 1 20
  15. 15 Green Mar 1 15
  16. 10 Green Mar 2 25

I'm trying to use group by with rolling window with and without lambda but it didn't work out

  1. df.groupby(by=['color','Mon']).rolling(window=2).sum()
  2. # also below command
  3. df.groupby(by=['color','Mon']).apply(lambda x: x.rolling(2).sum())

Any help would be greatly appreciated.
Thanks in advance!!

答案1

得分: 1

我认为你想要使用 .cumsum() 函数:

  1. df['rolling'] = df.groupby(['color', 'Mon'])['number'].cumsum()
  2. print(df)

打印结果:

  1. number color Mon 0 rolling
  2. 0 30 red Feb 1 30
  3. 1 20 red Feb 2 50
  4. 2 15 red Mar 1 15
  5. 3 10 red Mar 2 25
  6. 4 20 red Mar 3 45
  7. 5 20 red Mar 4 65
  8. 6 5 blue Feb 1 5
  9. 7 15 blue Feb 2 20
  10. 8 20 blue Mar 1 20
  11. 9 10 blue Mar 2 30
  12. 10 10 blue Mar 3 40
  13. 11 30 blue Mar 4 70
  14. 12 20 Green Feb 1 20
  15. 13 15 Green Mar 1 15
  16. 14 10 Green Mar 2 25
英文:

I think you want .cumsum():

  1. df['rolling'] = df.groupby(['color', 'Mon'])['number'].cumsum()
  2. print(df)

Prints:

  1. number color Mon 0 rolling
  2. 0 30 red Feb 1 30
  3. 1 20 red Feb 2 50
  4. 2 15 red Mar 1 15
  5. 3 10 red Mar 2 25
  6. 4 20 red Mar 3 45
  7. 5 20 red Mar 4 65
  8. 6 5 blue Feb 1 5
  9. 7 15 blue Feb 2 20
  10. 8 20 blue Mar 1 20
  11. 9 10 blue Mar 2 30
  12. 10 10 blue Mar 3 40
  13. 11 30 blue Mar 4 70
  14. 12 20 Green Feb 1 20
  15. 13 15 Green Mar 1 15
  16. 14 10 Green Mar 2 25

huangapple
  • 本文由 发表于 2023年2月24日 06:15:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/75550869.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定