英文:
How to bucket transactions by months and then calculate the difference per month in pandas
问题
以下是数据集的样式:
Trans ID| Trans Amount | Trans Date |
| -------- | --------- |
1| 50 | 2023-03-31 |
1| 600 | 2023-04-30 |
1| 40 | 2023-05-31 |
2| 500 | 2023-03-31 |
2| 500 | 2023-04-30 |
2| 10 | 2023-05-31 |
3| 980 | 2023-03-31 |
3| 1800 | 2023-04-30 |
3| 35 | 2023-05-31 |
我想按月份对交易进行分组,然后计算从一个月到另一个月的差异。
我希望数据按如下方式分组:
按月份分组:三月
Trans ID| Trans Amount | Trans_Date |
| -------- | -------- |
1| 50 | 2023-03-31 |
2| 500 | 2023-03-31 |
3| 980 | 2023-03-31 |
四月
Trans_ID| Trans_Amount | Trans_Date |
| -------- | -------- |
1| 600 | 2023-04-30 |
2| 500 | 2023-04-30 |
3| 1800 | 2023-04-30 |
五月
Trans_ID| Trans_Amount | Trans_Date |
| -------- | -------- |
1| 40 | 2023-05-31 |
2| 10 | 2023-05-31 |
3| 35 | 2023-05-31 |
然后,我想计算从三月到四月到五月的差异。
我尝试使用以下方式进行分组,但我不确定它是否能够完成我需要的工作,也不确定接下来该怎么计算从一个月到下一个月的差异:
d = {x : y for x, y in df.groupby(pd.to_date(df.Trans_Date).dt.strftime('%Y-%m'))}
请注意,我已经更正了日期格式的错误。
英文:
This is what the dataset looks like:
Trans ID| Trans Amount | Trans Date |
| -------- | --------- |
1| 50 | 2023-03-31 |
1| 600 | 2023-04-30 |
1| 40 | 2023-05-31 |
2| 500 | 2023-03-31 |
2| 500 | 2023-04-30 |
2| 10 | 2023-05-31 |
3| 980 | 2023-03-31 |
3| 1800 | 2023-04-30 |
3| 35 | 2023-05-31 |
I want to bucket the transactions by months and then calculate what the difference was from the one month to the other month.
I would like the data to be grouped as such:
Bucket By Month: March
Trans ID| Trans Amount | Trans_Date |
| -------- | -------- |
1| 50 | 2023-03-31 |
2| 500 | 2023-03-31 |
3| 980 | 2023-03-31 |
April
Trans_ID| Trans_Amount | Trans_Date |
| -------- | -------- |
1| 600 | 2023-04-30 |
2| 500 | 2023-04-30 |
3| 1800 | 2023-04-30 |
May
Trans_ID| Trans_Amount | Trans_Date |
| -------- | -------- |
1| 40 | 2023-05-31 |
2| 10 | 2023-05-31 |
3| 35 | 2023-05-31 |
From here I would like to then calculate the difference from March to April to May.
I tried to use group by as such, but I'm not sure whether it does what I need it to do and I am not sure what to do next for calculating the difference from one month to the next:
d = {x : y for x, y in df.groupby(pd.to_date(df.Trans_Date).dt.strftime('%y-%m'))}
答案1
得分: 2
这里只需要使用month
日期访问器与groupby
一起使用:
import pandas as pd
df = pd.DataFrame(
{
'a': [1, 2, 3, 4, 5, 6],
'date': ['2020-1-1', '2020-1-5', '2020-2-7', '2020-2-9', '2020-2-20', '2020-3-1']
}
)
df['date'] = pd.to_datetime(df.date)
months = {1: 'Jan', 2: 'Feb', 3: 'Mar', 4: 'Apr'}
for group, val in df.groupby(df.date.dt.month):
print(f'{months.get(group)}\n{val}\n\n')
这是输出结果:
Jan
a date
0 1 2020-01-01
1 2 2020-01-05
Feb
a date
2 3 2020-02-07
3 4 2020-02-09
4 5 2020-02-20
Mar
a date
5 6 2020-03-01
编辑:
这是计算平均交易金额与上个月相比的差异的方法:
month_average = df.groupby(df.date.dt.month).mean().reset_index()
month_average['date'] = [months.get(m) for m in month_average.date]
month_average['diff_to_previous'] = month_average.a.diff()
输出结果如下:
date a diff_to_previous
0 Jan 1.5 NaN
1 Feb 4.0 2.5
2 Mar 6.0 2.0
英文:
Here you just need to use month
date accessor with groupby:
import pandas as pd
df = pd.DataFrame(
{
'a': [1, 2, 3, 4, 5, 6],
'date': ['2020-1-1', '2020-1-5', '2020-2-7', '2020-2-9', '2020-2-20', '2020-3-1']
}
)
df['date'] = pd.to_datetime(df.date)
months = {1: 'Jan', 2 : 'Feb', 3: 'Mar', 4: 'Apr'}
for group, val in df.groupby(df.date.dt.month):
print(f'{months.get(group)}\n{val}\n\n')
And this is the output:
Jan
a date
0 1 2020-01-01
1 2 2020-01-05
Feb
a date
2 3 2020-02-07
3 4 2020-02-09
4 5 2020-02-20
Mar
a date
5 6 2020-03-01
EDIT:
This is how you calculate the difference between average transaction amounts compared to the previous month:
month_average = df.groupby(df.date.dt.month).mean().reset_index()
month_average['date'] = [months.get(m) for m in month_average.date]
month_average['diff_to_previous'] = month_average.a.diff()
And the output:
date a diff_to_previous
0 Jan 1.5 NaN
1 Feb 4.0 2.5
2 Mar 6.0 2.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论