如何在pandas中按月份分组交易,然后计算每月的差异。

huangapple go评论94阅读模式
英文:

How to bucket transactions by months and then calculate the difference per month in pandas

问题

以下是数据集的样式:

Trans ID| Trans Amount | Trans Date |
| -------- | --------- |
1| 50 | 2023-03-31 |
1| 600 | 2023-04-30 |
1| 40 | 2023-05-31 |
2| 500 | 2023-03-31 |
2| 500 | 2023-04-30 |
2| 10 | 2023-05-31 |
3| 980 | 2023-03-31 |
3| 1800 | 2023-04-30 |
3| 35 | 2023-05-31 |

我想按月份对交易进行分组,然后计算从一个月到另一个月的差异。

我希望数据按如下方式分组:

按月份分组:三月

Trans ID| Trans Amount | Trans_Date |
| -------- | -------- |
1| 50 | 2023-03-31 |
2| 500 | 2023-03-31 |
3| 980 | 2023-03-31 |

四月

Trans_ID| Trans_Amount | Trans_Date |
| -------- | -------- |
1| 600 | 2023-04-30 |
2| 500 | 2023-04-30 |
3| 1800 | 2023-04-30 |

五月

Trans_ID| Trans_Amount | Trans_Date |
| -------- | -------- |
1| 40 | 2023-05-31 |
2| 10 | 2023-05-31 |
3| 35 | 2023-05-31 |

然后,我想计算从三月到四月到五月的差异。

我尝试使用以下方式进行分组,但我不确定它是否能够完成我需要的工作,也不确定接下来该怎么计算从一个月到下一个月的差异:

  1. d = {x : y for x, y in df.groupby(pd.to_date(df.Trans_Date).dt.strftime('%Y-%m'))}

请注意,我已经更正了日期格式的错误。

英文:

This is what the dataset looks like:

Trans ID| Trans Amount | Trans Date |
| -------- | --------- |
1| 50 | 2023-03-31 |
1| 600 | 2023-04-30 |
1| 40 | 2023-05-31 |
2| 500 | 2023-03-31 |
2| 500 | 2023-04-30 |
2| 10 | 2023-05-31 |
3| 980 | 2023-03-31 |
3| 1800 | 2023-04-30 |
3| 35 | 2023-05-31 |

I want to bucket the transactions by months and then calculate what the difference was from the one month to the other month.

I would like the data to be grouped as such:

Bucket By Month: March

Trans ID| Trans Amount | Trans_Date |
| -------- | -------- |
1| 50 | 2023-03-31 |
2| 500 | 2023-03-31 |
3| 980 | 2023-03-31 |

April

Trans_ID| Trans_Amount | Trans_Date |
| -------- | -------- |
1| 600 | 2023-04-30 |
2| 500 | 2023-04-30 |
3| 1800 | 2023-04-30 |

May

Trans_ID| Trans_Amount | Trans_Date |
| -------- | -------- |
1| 40 | 2023-05-31 |
2| 10 | 2023-05-31 |
3| 35 | 2023-05-31 |

From here I would like to then calculate the difference from March to April to May.

I tried to use group by as such, but I'm not sure whether it does what I need it to do and I am not sure what to do next for calculating the difference from one month to the next:

  1. d = {x : y for x, y in df.groupby(pd.to_date(df.Trans_Date).dt.strftime('%y-%m'))}

答案1

得分: 2

这里只需要使用month日期访问器与groupby一起使用:

  1. import pandas as pd
  2. df = pd.DataFrame(
  3. {
  4. 'a': [1, 2, 3, 4, 5, 6],
  5. 'date': ['2020-1-1', '2020-1-5', '2020-2-7', '2020-2-9', '2020-2-20', '2020-3-1']
  6. }
  7. )
  8. df['date'] = pd.to_datetime(df.date)
  9. months = {1: 'Jan', 2: 'Feb', 3: 'Mar', 4: 'Apr'}
  10. for group, val in df.groupby(df.date.dt.month):
  11. print(f'{months.get(group)}\n{val}\n\n')

这是输出结果:

  1. Jan
  2. a date
  3. 0 1 2020-01-01
  4. 1 2 2020-01-05
  5. Feb
  6. a date
  7. 2 3 2020-02-07
  8. 3 4 2020-02-09
  9. 4 5 2020-02-20
  10. Mar
  11. a date
  12. 5 6 2020-03-01

编辑:
这是计算平均交易金额与上个月相比的差异的方法:

  1. month_average = df.groupby(df.date.dt.month).mean().reset_index()
  2. month_average['date'] = [months.get(m) for m in month_average.date]
  3. month_average['diff_to_previous'] = month_average.a.diff()

输出结果如下:

  1. date a diff_to_previous
  2. 0 Jan 1.5 NaN
  3. 1 Feb 4.0 2.5
  4. 2 Mar 6.0 2.0
英文:

Here you just need to use month date accessor with groupby:

  1. import pandas as pd
  2. df = pd.DataFrame(
  3. {
  4. 'a': [1, 2, 3, 4, 5, 6],
  5. 'date': ['2020-1-1', '2020-1-5', '2020-2-7', '2020-2-9', '2020-2-20', '2020-3-1']
  6. }
  7. )
  8. df['date'] = pd.to_datetime(df.date)
  9. months = {1: 'Jan', 2 : 'Feb', 3: 'Mar', 4: 'Apr'}
  10. for group, val in df.groupby(df.date.dt.month):
  11. print(f'{months.get(group)}\n{val}\n\n')

And this is the output:

  1. Jan
  2. a date
  3. 0 1 2020-01-01
  4. 1 2 2020-01-05
  5. Feb
  6. a date
  7. 2 3 2020-02-07
  8. 3 4 2020-02-09
  9. 4 5 2020-02-20
  10. Mar
  11. a date
  12. 5 6 2020-03-01

EDIT:
This is how you calculate the difference between average transaction amounts compared to the previous month:

  1. month_average = df.groupby(df.date.dt.month).mean().reset_index()
  2. month_average['date'] = [months.get(m) for m in month_average.date]
  3. month_average['diff_to_previous'] = month_average.a.diff()

And the output:

  1. date a diff_to_previous
  2. 0 Jan 1.5 NaN
  3. 1 Feb 4.0 2.5
  4. 2 Mar 6.0 2.0

huangapple
  • 本文由 发表于 2023年5月29日 07:10:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76353926.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定