2023年2月10日 07:29:23go评论92阅读模式

英文:

python pandas resample monthly aggregated data to daily, then aggregate back to weekly

问题

以下是翻译的代码部分：

这是我的示例数据：
| team| sales | month |
| -------- | -------------- | ------------ |
| a| 100|1/1/2023 |
| a| 200|2/1/2023 |  
| b| 600|1/1/2023 |  
| b| 300|2/1/2023 |  
在Pandas中加载数据的方法如下：
mydata = pd.DataFrame([
['team','sales','month'],
['a', 100, '1/1/2023'],
['a', 200, '2/1/2023'],
['b', 600, '1/1/2023'],
['b', 300, '2/1/2023']
])
mydata.columns = mydata.iloc[0]
mydata = mydata[1:]
mydata['month'] = pd.to_datetime(mydata['month'])
我对团队"a"的期望结果是按每周聚合的数据，以星期一开始，如下所示：
| team| sales | Monday Week|
| -------- | -------------- | ------------ |
| a| 22.58|1/2/2023 |
| a| 22.58|1/9/2023 |
| a| 22.58|1/16/2023 |
| a| 22.58|1/23/2023 |
| a| 42.17|1/30/2023 |
| a| 50|2/6/2023 |  
| a| 50|2/13/2023 |  
| a| 50|2/20/2023 |  
| a| 14.29|2/27/2023 |
因此，每周的销售额计算逻辑如下：
1月的销售额为$100，所以每天的平均销售额为100/31 = 3.23，* 7天 = 22.58美元，用于1月的每周。
2月的销售额为$200，共28天，因此（$200/28）*7 = 50美元用于2月的每周。
对于从2023年1月30日开始的那一周，计算略为复杂。需要将1月份的销售率应用于1月30日和1月31日的前两天，然后从2月1日开始，计算2月的销售率，直到2月5日。因此，计算结果为5*（200/28）+2*（100/31）= 42.17。
是否可以在Pandas中执行此操作？我认为可能有效的逻辑是将每月的总销售额分解为每日数据，并使用Pandas将其汇总为从每月的星期一开始的每周数据，但我在尝试链接日期函数时感到困惑。

希望这有助于您理解代码部分的内容。如果您需要更多帮助，请随时提出问题。

英文:

Here is my example data:

team	sales	month
a	100	1/1/2023
a	200	2/1/2023
b	600	1/1/2023
b	300	2/1/2023

load in pandas like so:

mydata = pd.DataFrame([
[&#39;team&#39;,&#39;sales&#39;,&#39;month&#39;],
[&#39;a&#39;,	100,	&#39;1/1/2023&#39;],
[&#39;a&#39;,	200,	&#39;2/1/2023&#39;],
[&#39;b&#39;,	600,	&#39;1/1/2023&#39;],
[&#39;b&#39;,	300,	&#39;2/1/2023&#39;]
])
mydata.columns = mydata.iloc[0]
mydata = mydata[1:]  
mydata[&#39;month&#39;] = pd.to_datetime(mydata[&#39;month&#39;])

My desired outcome for team "a" is this data aggregated by each week as starting on Monday, like this:

team	sales	Monday Week
a	22.58	1/2/2023
a	22.58	1/9/2023
a	22.58	1/16/2023
a	22.58	1/23/2023
a	42.17	1/30/2023
a	50	2/6/2023
a	50	2/13/2023
a	50	2/20/2023
a	14.29	2/27/2023

So the logic on the calculated sales per week is:
$100 of sales in January, so avg sales per day is 100/31 = 3.23 per day, * 7 days in a weeks = $22.58 for each week in January.
February is $200 over 28 days, so ($200/28)*7 = $50 a week in Feb.

The calculation on the week starting 1/30/2023 is a little more complicated. I need to carry the January rate the first 2 days of 1/30 and 1/31, then start summing the Feb rate for the following 5 days in Feb (until 2/5/2023). So it would be 5*(200/28)+2*(100/31) = 42.17

Is there a way to do this in Pandas? I believe the logic that may work is taking each monthly total, decomposing that into daily data with an average rate, then using pandas to aggregate back up to weekly data starting on Monday for each month, but I'm lost trying to chain together the date functions.

答案1

得分: 1

I think you have miscalculation for team A for the week of 1/30/2023. It has no sales in Feb so its sales for the week should be 3.23 * 2 = 4.46.

Here's one way to do that:

def get_weekly_sales(group: pd.DataFrame) -> pd.DataFrame:
    tmp = (
        # Put month to the index and convert it to monthly period
        group.set_index("month")[["sales"]]
        .to_period("M")
        # Calculate the average daily sales
        .assign(sales=lambda x: x["sales"] / x.index.days_in_month)
        # Up-sample the dataframe to daily
        .resample("1D")
        .ffill()
        # Sum by week
        .groupby(pd.Grouper(freq="W"))
        .sum()
    )
    # Clean up the index
    tmp.index = tmp.index.to_timestamp().rename("week_starting")
    return tmp
df.groupby("team").apply(get_weekly_sales)

英文:

I think you have miscalculation for team A for the week of 1/30/2023. It has no sales in Feb so its sales for the week should be 3.23 * 2 = 4.46.

Here's one way to do that:

def get_weekly_sales(group: pd.DataFrame) -&gt; pd.DataFrame:
    tmp = (
        # Put month to the index and convert it to monthly period
        group.set_index(&quot;month&quot;)[[&quot;sales&quot;]]
        .to_period(&quot;M&quot;)
        # Calculate the average daily sales
        .assign(sales=lambda x: x[&quot;sales&quot;] / x.index.days_in_month)
        # Up-sample the dataframe to daily
        .resample(&quot;1D&quot;)
        .ffill()
        # Sum by week
        .groupby(pd.Grouper(freq=&quot;W&quot;))
        .sum()
    )
    # Clean up the index
    tmp.index = tmp.index.to_timestamp().rename(&quot;week_starting&quot;)
    return tmp
df.groupby(&quot;team&quot;).apply(get_weekly_sales)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python Pandas 将按月聚合的数据重新采样为按日，然后再次聚合为按周。

问题

答案1

3D曲面图为空

移除QLabel中不必要的空格。

PySpark 3高阶函数用于提取到列中

如何使用pytz和datetime库将具有区域/城市时区格式的时间字符串转换为UTC？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。