Python Pandas 将按月聚合的数据重新采样为按日,然后再次聚合为按周。

huangapple go评论65阅读模式
英文:

python pandas resample monthly aggregated data to daily, then aggregate back to weekly

问题

以下是翻译的代码部分:

这是我的示例数据

| team| sales | month |
| -------- | -------------- | ------------ |
| a| 100|1/1/2023 |
| a| 200|2/1/2023 |  
| b| 600|1/1/2023 |  
| b| 300|2/1/2023 |  

在Pandas中加载数据的方法如下

mydata = pd.DataFrame([
['team','sales','month'],
['a', 100, '1/1/2023'],
['a', 200, '2/1/2023'],
['b', 600, '1/1/2023'],
['b', 300, '2/1/2023']
])
mydata.columns = mydata.iloc[0]
mydata = mydata[1:]
mydata['month'] = pd.to_datetime(mydata['month'])

我对团队"a"的期望结果是按每周聚合的数据以星期一开始如下所示

| team| sales | Monday Week|
| -------- | -------------- | ------------ |
| a| 22.58|1/2/2023 |
| a| 22.58|1/9/2023 |
| a| 22.58|1/16/2023 |
| a| 22.58|1/23/2023 |
| a| 42.17|1/30/2023 |
| a| 50|2/6/2023 |  
| a| 50|2/13/2023 |  
| a| 50|2/20/2023 |  
| a| 14.29|2/27/2023 |

因此每周的销售额计算逻辑如下

1月的销售额为$100所以每天的平均销售额为100/31 = 3.23* 7= 22.58美元用于1月的每周
2月的销售额为$200共28天因此($200/28*7 = 50美元用于2月的每周

对于从2023年1月30日开始的那一周计算略为复杂需要将1月份的销售率应用于1月30日和1月31日的前两天然后从2月1日开始计算2月的销售率直到2月5日因此计算结果为5*200/28+2*100/31= 42.17

是否可以在Pandas中执行此操作我认为可能有效的逻辑是将每月的总销售额分解为每日数据并使用Pandas将其汇总为从每月的星期一开始的每周数据但我在尝试链接日期函数时感到困惑

希望这有助于您理解代码部分的内容。如果您需要更多帮助,请随时提出问题。

英文:

Here is my example data:

team sales month
a 100 1/1/2023
a 200 2/1/2023
b 600 1/1/2023
b 300 2/1/2023

load in pandas like so:

mydata = pd.DataFrame([
['team','sales','month'],
['a',	100,	'1/1/2023'],
['a',	200,	'2/1/2023'],
['b',	600,	'1/1/2023'],
['b',	300,	'2/1/2023']
])
mydata.columns = mydata.iloc[0]
mydata = mydata[1:]  
mydata['month'] = pd.to_datetime(mydata['month'])

My desired outcome for team "a" is this data aggregated by each week as starting on Monday, like this:

team sales Monday Week
a 22.58 1/2/2023
a 22.58 1/9/2023
a 22.58 1/16/2023
a 22.58 1/23/2023
a 42.17 1/30/2023
a 50 2/6/2023
a 50 2/13/2023
a 50 2/20/2023
a 14.29 2/27/2023

So the logic on the calculated sales per week is:
$100 of sales in January, so avg sales per day is 100/31 = 3.23 per day, * 7 days in a weeks = $22.58 for each week in January.
February is $200 over 28 days, so ($200/28)*7 = $50 a week in Feb.

The calculation on the week starting 1/30/2023 is a little more complicated. I need to carry the January rate the first 2 days of 1/30 and 1/31, then start summing the Feb rate for the following 5 days in Feb (until 2/5/2023). So it would be 5*(200/28)+2*(100/31) = 42.17

Is there a way to do this in Pandas? I believe the logic that may work is taking each monthly total, decomposing that into daily data with an average rate, then using pandas to aggregate back up to weekly data starting on Monday for each month, but I'm lost trying to chain together the date functions.

答案1

得分: 1

I think you have miscalculation for team A for the week of 1/30/2023. It has no sales in Feb so its sales for the week should be 3.23 * 2 = 4.46.

Here's one way to do that:

def get_weekly_sales(group: pd.DataFrame) -> pd.DataFrame:
    tmp = (
        # Put month to the index and convert it to monthly period
        group.set_index("month")[["sales"]]
        .to_period("M")
        # Calculate the average daily sales
        .assign(sales=lambda x: x["sales"] / x.index.days_in_month)
        # Up-sample the dataframe to daily
        .resample("1D")
        .ffill()
        # Sum by week
        .groupby(pd.Grouper(freq="W"))
        .sum()
    )
    # Clean up the index
    tmp.index = tmp.index.to_timestamp().rename("week_starting")

    return tmp

df.groupby("team").apply(get_weekly_sales)
英文:

I think you have miscalculation for team A for the week of 1/30/2023. It has no sales in Feb so its sales for the week should be 3.23 * 2 = 4.46.

Here's one way to do that:

def get_weekly_sales(group: pd.DataFrame) -> pd.DataFrame:
    tmp = (
        # Put month to the index and convert it to monthly period
        group.set_index("month")[["sales"]]
        .to_period("M")
        # Calculate the average daily sales
        .assign(sales=lambda x: x["sales"] / x.index.days_in_month)
        # Up-sample the dataframe to daily
        .resample("1D")
        .ffill()
        # Sum by week
        .groupby(pd.Grouper(freq="W"))
        .sum()
    )
    # Clean up the index
    tmp.index = tmp.index.to_timestamp().rename("week_starting")

    return tmp

df.groupby("team").apply(get_weekly_sales)

huangapple
  • 本文由 发表于 2023年2月10日 07:29:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75405521.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定