Python Pandas 将按月聚合的数据重新采样为按日,然后再次聚合为按周。

huangapple go评论92阅读模式
英文:

python pandas resample monthly aggregated data to daily, then aggregate back to weekly

问题

以下是翻译的代码部分:

  1. 这是我的示例数据
  2. | team| sales | month |
  3. | -------- | -------------- | ------------ |
  4. | a| 100|1/1/2023 |
  5. | a| 200|2/1/2023 |
  6. | b| 600|1/1/2023 |
  7. | b| 300|2/1/2023 |
  8. Pandas中加载数据的方法如下
  9. mydata = pd.DataFrame([
  10. ['team','sales','month'],
  11. ['a', 100, '1/1/2023'],
  12. ['a', 200, '2/1/2023'],
  13. ['b', 600, '1/1/2023'],
  14. ['b', 300, '2/1/2023']
  15. ])
  16. mydata.columns = mydata.iloc[0]
  17. mydata = mydata[1:]
  18. mydata['month'] = pd.to_datetime(mydata['month'])
  19. 我对团队"a"的期望结果是按每周聚合的数据以星期一开始如下所示
  20. | team| sales | Monday Week|
  21. | -------- | -------------- | ------------ |
  22. | a| 22.58|1/2/2023 |
  23. | a| 22.58|1/9/2023 |
  24. | a| 22.58|1/16/2023 |
  25. | a| 22.58|1/23/2023 |
  26. | a| 42.17|1/30/2023 |
  27. | a| 50|2/6/2023 |
  28. | a| 50|2/13/2023 |
  29. | a| 50|2/20/2023 |
  30. | a| 14.29|2/27/2023 |
  31. 因此每周的销售额计算逻辑如下
  32. 1月的销售额为$100所以每天的平均销售额为100/31 = 3.23* 7 = 22.58美元用于1月的每周
  33. 2月的销售额为$20028因此$200/28*7 = 50美元用于2月的每周
  34. 对于从2023130日开始的那一周计算略为复杂需要将1月份的销售率应用于130日和131日的前两天然后从21日开始计算2月的销售率直到25因此计算结果为5*200/28+2*100/31= 42.17
  35. 是否可以在Pandas中执行此操作我认为可能有效的逻辑是将每月的总销售额分解为每日数据并使用Pandas将其汇总为从每月的星期一开始的每周数据但我在尝试链接日期函数时感到困惑

希望这有助于您理解代码部分的内容。如果您需要更多帮助,请随时提出问题。

英文:

Here is my example data:

team sales month
a 100 1/1/2023
a 200 2/1/2023
b 600 1/1/2023
b 300 2/1/2023

load in pandas like so:

  1. mydata = pd.DataFrame([
  2. ['team','sales','month'],
  3. ['a', 100, '1/1/2023'],
  4. ['a', 200, '2/1/2023'],
  5. ['b', 600, '1/1/2023'],
  6. ['b', 300, '2/1/2023']
  7. ])
  8. mydata.columns = mydata.iloc[0]
  9. mydata = mydata[1:]
  10. mydata['month'] = pd.to_datetime(mydata['month'])

My desired outcome for team "a" is this data aggregated by each week as starting on Monday, like this:

team sales Monday Week
a 22.58 1/2/2023
a 22.58 1/9/2023
a 22.58 1/16/2023
a 22.58 1/23/2023
a 42.17 1/30/2023
a 50 2/6/2023
a 50 2/13/2023
a 50 2/20/2023
a 14.29 2/27/2023

So the logic on the calculated sales per week is:
$100 of sales in January, so avg sales per day is 100/31 = 3.23 per day, * 7 days in a weeks = $22.58 for each week in January.
February is $200 over 28 days, so ($200/28)*7 = $50 a week in Feb.

The calculation on the week starting 1/30/2023 is a little more complicated. I need to carry the January rate the first 2 days of 1/30 and 1/31, then start summing the Feb rate for the following 5 days in Feb (until 2/5/2023). So it would be 5*(200/28)+2*(100/31) = 42.17

Is there a way to do this in Pandas? I believe the logic that may work is taking each monthly total, decomposing that into daily data with an average rate, then using pandas to aggregate back up to weekly data starting on Monday for each month, but I'm lost trying to chain together the date functions.

答案1

得分: 1

I think you have miscalculation for team A for the week of 1/30/2023. It has no sales in Feb so its sales for the week should be 3.23 * 2 = 4.46.

Here's one way to do that:

  1. def get_weekly_sales(group: pd.DataFrame) -> pd.DataFrame:
  2. tmp = (
  3. # Put month to the index and convert it to monthly period
  4. group.set_index("month")[["sales"]]
  5. .to_period("M")
  6. # Calculate the average daily sales
  7. .assign(sales=lambda x: x["sales"] / x.index.days_in_month)
  8. # Up-sample the dataframe to daily
  9. .resample("1D")
  10. .ffill()
  11. # Sum by week
  12. .groupby(pd.Grouper(freq="W"))
  13. .sum()
  14. )
  15. # Clean up the index
  16. tmp.index = tmp.index.to_timestamp().rename("week_starting")
  17. return tmp
  18. df.groupby("team").apply(get_weekly_sales)
英文:

I think you have miscalculation for team A for the week of 1/30/2023. It has no sales in Feb so its sales for the week should be 3.23 * 2 = 4.46.

Here's one way to do that:

  1. def get_weekly_sales(group: pd.DataFrame) -> pd.DataFrame:
  2. tmp = (
  3. # Put month to the index and convert it to monthly period
  4. group.set_index("month")[["sales"]]
  5. .to_period("M")
  6. # Calculate the average daily sales
  7. .assign(sales=lambda x: x["sales"] / x.index.days_in_month)
  8. # Up-sample the dataframe to daily
  9. .resample("1D")
  10. .ffill()
  11. # Sum by week
  12. .groupby(pd.Grouper(freq="W"))
  13. .sum()
  14. )
  15. # Clean up the index
  16. tmp.index = tmp.index.to_timestamp().rename("week_starting")
  17. return tmp
  18. df.groupby("team").apply(get_weekly_sales)

huangapple
  • 本文由 发表于 2023年2月10日 07:29:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75405521.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定