如何在pandas中按月份分组交易,然后计算每月的差异。

huangapple go评论69阅读模式
英文:

How to bucket transactions by months and then calculate the difference per month in pandas

问题

以下是数据集的样式:

Trans ID| Trans Amount | Trans Date |
| -------- | --------- |
1| 50 | 2023-03-31 |
1| 600 | 2023-04-30 |
1| 40 | 2023-05-31 |
2| 500 | 2023-03-31 |
2| 500 | 2023-04-30 |
2| 10 | 2023-05-31 |
3| 980 | 2023-03-31 |
3| 1800 | 2023-04-30 |
3| 35 | 2023-05-31 |

我想按月份对交易进行分组,然后计算从一个月到另一个月的差异。

我希望数据按如下方式分组:

按月份分组:三月

Trans ID| Trans Amount | Trans_Date |
| -------- | -------- |
1| 50 | 2023-03-31 |
2| 500 | 2023-03-31 |
3| 980 | 2023-03-31 |

四月

Trans_ID| Trans_Amount | Trans_Date |
| -------- | -------- |
1| 600 | 2023-04-30 |
2| 500 | 2023-04-30 |
3| 1800 | 2023-04-30 |

五月

Trans_ID| Trans_Amount | Trans_Date |
| -------- | -------- |
1| 40 | 2023-05-31 |
2| 10 | 2023-05-31 |
3| 35 | 2023-05-31 |

然后,我想计算从三月到四月到五月的差异。

我尝试使用以下方式进行分组,但我不确定它是否能够完成我需要的工作,也不确定接下来该怎么计算从一个月到下一个月的差异:

d = {x : y for x, y in df.groupby(pd.to_date(df.Trans_Date).dt.strftime('%Y-%m'))}

请注意,我已经更正了日期格式的错误。

英文:

This is what the dataset looks like:

Trans ID| Trans Amount | Trans Date |
| -------- | --------- |
1| 50 | 2023-03-31 |
1| 600 | 2023-04-30 |
1| 40 | 2023-05-31 |
2| 500 | 2023-03-31 |
2| 500 | 2023-04-30 |
2| 10 | 2023-05-31 |
3| 980 | 2023-03-31 |
3| 1800 | 2023-04-30 |
3| 35 | 2023-05-31 |

I want to bucket the transactions by months and then calculate what the difference was from the one month to the other month.

I would like the data to be grouped as such:

Bucket By Month: March

Trans ID| Trans Amount | Trans_Date |
| -------- | -------- |
1| 50 | 2023-03-31 |
2| 500 | 2023-03-31 |
3| 980 | 2023-03-31 |

April

Trans_ID| Trans_Amount | Trans_Date |
| -------- | -------- |
1| 600 | 2023-04-30 |
2| 500 | 2023-04-30 |
3| 1800 | 2023-04-30 |

May

Trans_ID| Trans_Amount | Trans_Date |
| -------- | -------- |
1| 40 | 2023-05-31 |
2| 10 | 2023-05-31 |
3| 35 | 2023-05-31 |

From here I would like to then calculate the difference from March to April to May.

I tried to use group by as such, but I'm not sure whether it does what I need it to do and I am not sure what to do next for calculating the difference from one month to the next:

d = {x : y for x, y in df.groupby(pd.to_date(df.Trans_Date).dt.strftime('%y-%m'))}

答案1

得分: 2

这里只需要使用month日期访问器与groupby一起使用:

import pandas as pd

df = pd.DataFrame(
    {
        'a': [1, 2, 3, 4, 5, 6],
        'date': ['2020-1-1', '2020-1-5', '2020-2-7', '2020-2-9', '2020-2-20', '2020-3-1']
    }
)
df['date'] = pd.to_datetime(df.date)

months = {1: 'Jan', 2: 'Feb', 3: 'Mar', 4: 'Apr'}

for group, val in df.groupby(df.date.dt.month):
    print(f'{months.get(group)}\n{val}\n\n')

这是输出结果:

Jan
   a       date
0  1 2020-01-01
1  2 2020-01-05


Feb
   a       date
2  3 2020-02-07
3  4 2020-02-09
4  5 2020-02-20


Mar
   a       date
5  6 2020-03-01

编辑:
这是计算平均交易金额与上个月相比的差异的方法:

month_average = df.groupby(df.date.dt.month).mean().reset_index()
month_average['date'] = [months.get(m) for m in month_average.date]
month_average['diff_to_previous'] = month_average.a.diff()

输出结果如下:

  date    a  diff_to_previous
0  Jan  1.5               NaN
1  Feb  4.0               2.5
2  Mar  6.0               2.0
英文:

Here you just need to use month date accessor with groupby:

import pandas as pd

df = pd.DataFrame(
    {
        'a': [1, 2, 3, 4, 5, 6],
        'date': ['2020-1-1', '2020-1-5', '2020-2-7', '2020-2-9', '2020-2-20', '2020-3-1']
    }
)
df['date'] = pd.to_datetime(df.date)

months = {1: 'Jan', 2 : 'Feb', 3: 'Mar', 4: 'Apr'}

for group, val in df.groupby(df.date.dt.month):
    print(f'{months.get(group)}\n{val}\n\n')

And this is the output:

Jan
   a       date
0  1 2020-01-01
1  2 2020-01-05


Feb
   a       date
2  3 2020-02-07
3  4 2020-02-09
4  5 2020-02-20


Mar
   a       date
5  6 2020-03-01

EDIT:
This is how you calculate the difference between average transaction amounts compared to the previous month:

month_average = df.groupby(df.date.dt.month).mean().reset_index()
month_average['date'] = [months.get(m) for m in month_average.date]
month_average['diff_to_previous'] = month_average.a.diff()

And the output:

  date    a  diff_to_previous
0  Jan  1.5               NaN
1  Feb  4.0               2.5
2  Mar  6.0               2.0

huangapple
  • 本文由 发表于 2023年5月29日 07:10:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76353926.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定