英文:
python pandas resample monthly aggregated data to daily, then aggregate back to weekly
问题
以下是翻译的代码部分:
这是我的示例数据:
| team| sales | month |
| -------- | -------------- | ------------ |
| a| 100|1/1/2023 |
| a| 200|2/1/2023 |
| b| 600|1/1/2023 |
| b| 300|2/1/2023 |
在Pandas中加载数据的方法如下:
mydata = pd.DataFrame([
['team','sales','month'],
['a', 100, '1/1/2023'],
['a', 200, '2/1/2023'],
['b', 600, '1/1/2023'],
['b', 300, '2/1/2023']
])
mydata.columns = mydata.iloc[0]
mydata = mydata[1:]
mydata['month'] = pd.to_datetime(mydata['month'])
我对团队"a"的期望结果是按每周聚合的数据,以星期一开始,如下所示:
| team| sales | Monday Week|
| -------- | -------------- | ------------ |
| a| 22.58|1/2/2023 |
| a| 22.58|1/9/2023 |
| a| 22.58|1/16/2023 |
| a| 22.58|1/23/2023 |
| a| 42.17|1/30/2023 |
| a| 50|2/6/2023 |
| a| 50|2/13/2023 |
| a| 50|2/20/2023 |
| a| 14.29|2/27/2023 |
因此,每周的销售额计算逻辑如下:
1月的销售额为$100,所以每天的平均销售额为100/31 = 3.23,* 7天 = 22.58美元,用于1月的每周。
2月的销售额为$200,共28天,因此($200/28)*7 = 50美元用于2月的每周。
对于从2023年1月30日开始的那一周,计算略为复杂。需要将1月份的销售率应用于1月30日和1月31日的前两天,然后从2月1日开始,计算2月的销售率,直到2月5日。因此,计算结果为5*(200/28)+2*(100/31)= 42.17。
是否可以在Pandas中执行此操作?我认为可能有效的逻辑是将每月的总销售额分解为每日数据,并使用Pandas将其汇总为从每月的星期一开始的每周数据,但我在尝试链接日期函数时感到困惑。
希望这有助于您理解代码部分的内容。如果您需要更多帮助,请随时提出问题。
英文:
Here is my example data:
team | sales | month |
---|---|---|
a | 100 | 1/1/2023 |
a | 200 | 2/1/2023 |
b | 600 | 1/1/2023 |
b | 300 | 2/1/2023 |
load in pandas like so:
mydata = pd.DataFrame([
['team','sales','month'],
['a', 100, '1/1/2023'],
['a', 200, '2/1/2023'],
['b', 600, '1/1/2023'],
['b', 300, '2/1/2023']
])
mydata.columns = mydata.iloc[0]
mydata = mydata[1:]
mydata['month'] = pd.to_datetime(mydata['month'])
My desired outcome for team "a" is this data aggregated by each week as starting on Monday, like this:
team | sales | Monday Week |
---|---|---|
a | 22.58 | 1/2/2023 |
a | 22.58 | 1/9/2023 |
a | 22.58 | 1/16/2023 |
a | 22.58 | 1/23/2023 |
a | 42.17 | 1/30/2023 |
a | 50 | 2/6/2023 |
a | 50 | 2/13/2023 |
a | 50 | 2/20/2023 |
a | 14.29 | 2/27/2023 |
So the logic on the calculated sales per week is:
$100 of sales in January, so avg sales per day is 100/31 = 3.23 per day, * 7 days in a weeks = $22.58 for each week in January.
February is $200 over 28 days, so ($200/28)*7 = $50 a week in Feb.
The calculation on the week starting 1/30/2023 is a little more complicated. I need to carry the January rate the first 2 days of 1/30 and 1/31, then start summing the Feb rate for the following 5 days in Feb (until 2/5/2023). So it would be 5*(200/28)+2*(100/31) = 42.17
Is there a way to do this in Pandas? I believe the logic that may work is taking each monthly total, decomposing that into daily data with an average rate, then using pandas to aggregate back up to weekly data starting on Monday for each month, but I'm lost trying to chain together the date functions.
答案1
得分: 1
I think you have miscalculation for team A for the week of 1/30/2023
. It has no sales in Feb so its sales for the week should be 3.23 * 2 = 4.46
.
Here's one way to do that:
def get_weekly_sales(group: pd.DataFrame) -> pd.DataFrame:
tmp = (
# Put month to the index and convert it to monthly period
group.set_index("month")[["sales"]]
.to_period("M")
# Calculate the average daily sales
.assign(sales=lambda x: x["sales"] / x.index.days_in_month)
# Up-sample the dataframe to daily
.resample("1D")
.ffill()
# Sum by week
.groupby(pd.Grouper(freq="W"))
.sum()
)
# Clean up the index
tmp.index = tmp.index.to_timestamp().rename("week_starting")
return tmp
df.groupby("team").apply(get_weekly_sales)
英文:
I think you have miscalculation for team A for the week of 1/30/2023
. It has no sales in Feb so its sales for the week should be 3.23 * 2 = 4.46
.
Here's one way to do that:
def get_weekly_sales(group: pd.DataFrame) -> pd.DataFrame:
tmp = (
# Put month to the index and convert it to monthly period
group.set_index("month")[["sales"]]
.to_period("M")
# Calculate the average daily sales
.assign(sales=lambda x: x["sales"] / x.index.days_in_month)
# Up-sample the dataframe to daily
.resample("1D")
.ffill()
# Sum by week
.groupby(pd.Grouper(freq="W"))
.sum()
)
# Clean up the index
tmp.index = tmp.index.to_timestamp().rename("week_starting")
return tmp
df.groupby("team").apply(get_weekly_sales)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论