2023年5月29日 07:10:34go评论94阅读模式

英文:

How to bucket transactions by months and then calculate the difference per month in pandas

问题

以下是数据集的样式：

Trans ID| Trans Amount | Trans Date |
| -------- | --------- |
1| 50 | 2023-03-31 |
1| 600 | 2023-04-30 |
1| 40 | 2023-05-31 |
2| 500 | 2023-03-31 |
2| 500 | 2023-04-30 |
2| 10 | 2023-05-31 |
3| 980 | 2023-03-31 |
3| 1800 | 2023-04-30 |
3| 35 | 2023-05-31 |

我想按月份对交易进行分组，然后计算从一个月到另一个月的差异。

我希望数据按如下方式分组：

按月份分组：三月

Trans ID| Trans Amount | Trans_Date |
| -------- | -------- |
1| 50 | 2023-03-31 |
2| 500 | 2023-03-31 |
3| 980 | 2023-03-31 |

四月

Trans_ID| Trans_Amount | Trans_Date |
| -------- | -------- |
1| 600 | 2023-04-30 |
2| 500 | 2023-04-30 |
3| 1800 | 2023-04-30 |

五月

Trans_ID| Trans_Amount | Trans_Date |
| -------- | -------- |
1| 40 | 2023-05-31 |
2| 10 | 2023-05-31 |
3| 35 | 2023-05-31 |

然后，我想计算从三月到四月到五月的差异。

我尝试使用以下方式进行分组，但我不确定它是否能够完成我需要的工作，也不确定接下来该怎么计算从一个月到下一个月的差异：

d = {x : y for x, y in df.groupby(pd.to_date(df.Trans_Date).dt.strftime('%Y-%m'))}

请注意，我已经更正了日期格式的错误。

英文:

This is what the dataset looks like:

I want to bucket the transactions by months and then calculate what the difference was from the one month to the other month.

I would like the data to be grouped as such:

Bucket By Month: March

Trans ID| Trans Amount | Trans_Date |
| -------- | -------- |
1| 50 | 2023-03-31 |
2| 500 | 2023-03-31 |
3| 980 | 2023-03-31 |

April

Trans_ID| Trans_Amount | Trans_Date |
| -------- | -------- |
1| 600 | 2023-04-30 |
2| 500 | 2023-04-30 |
3| 1800 | 2023-04-30 |

May

Trans_ID| Trans_Amount | Trans_Date |
| -------- | -------- |
1| 40 | 2023-05-31 |
2| 10 | 2023-05-31 |
3| 35 | 2023-05-31 |

From here I would like to then calculate the difference from March to April to May.

I tried to use group by as such, but I'm not sure whether it does what I need it to do and I am not sure what to do next for calculating the difference from one month to the next:

d = {x : y for x, y in df.groupby(pd.to_date(df.Trans_Date).dt.strftime(&#39;%y-%m&#39;))}

答案1

得分: 2

这里只需要使用month日期访问器与groupby一起使用：

import pandas as pd
df = pd.DataFrame(
    {
        'a': [1, 2, 3, 4, 5, 6],
        'date': ['2020-1-1', '2020-1-5', '2020-2-7', '2020-2-9', '2020-2-20', '2020-3-1']
    }
)
df['date'] = pd.to_datetime(df.date)
months = {1: 'Jan', 2: 'Feb', 3: 'Mar', 4: 'Apr'}
for group, val in df.groupby(df.date.dt.month):
    print(f'{months.get(group)}\n{val}\n\n')

这是输出结果：

Jan
   a       date
0  1 2020-01-01
1  2 2020-01-05
Feb
   a       date
2  3 2020-02-07
3  4 2020-02-09
4  5 2020-02-20
Mar
   a       date
5  6 2020-03-01

编辑:
这是计算平均交易金额与上个月相比的差异的方法：

month_average = df.groupby(df.date.dt.month).mean().reset_index()
month_average['date'] = [months.get(m) for m in month_average.date]
month_average['diff_to_previous'] = month_average.a.diff()

输出结果如下：

  date    a  diff_to_previous
0  Jan  1.5               NaN
1  Feb  4.0               2.5
2  Mar  6.0               2.0

英文:

Here you just need to use month date accessor with groupby:

import pandas as pd
df = pd.DataFrame(
    {
        &#39;a&#39;: [1, 2, 3, 4, 5, 6],
        &#39;date&#39;: [&#39;2020-1-1&#39;, &#39;2020-1-5&#39;, &#39;2020-2-7&#39;, &#39;2020-2-9&#39;, &#39;2020-2-20&#39;, &#39;2020-3-1&#39;]
    }
)
df[&#39;date&#39;] = pd.to_datetime(df.date)
months = {1: &#39;Jan&#39;, 2 : &#39;Feb&#39;, 3: &#39;Mar&#39;, 4: &#39;Apr&#39;}
for group, val in df.groupby(df.date.dt.month):
    print(f&#39;{months.get(group)}\n{val}\n\n&#39;)

And this is the output:

Jan
   a       date
0  1 2020-01-01
1  2 2020-01-05
Feb
   a       date
2  3 2020-02-07
3  4 2020-02-09
4  5 2020-02-20
Mar
   a       date
5  6 2020-03-01

EDIT:
This is how you calculate the difference between average transaction amounts compared to the previous month:

month_average = df.groupby(df.date.dt.month).mean().reset_index()
month_average[&#39;date&#39;] = [months.get(m) for m in month_average.date]
month_average[&#39;diff_to_previous&#39;] = month_average.a.diff()

And the output:

  date    a  diff_to_previous
0  Jan  1.5               NaN
1  Feb  4.0               2.5
2  Mar  6.0               2.0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在pandas中按月份分组交易，然后计算每月的差异。

问题

答案1

AttributeError: 模块’environ’没有属性’Env’

从Powershell脚本安装Python

如何将每个点的值添加到等高线图中？

无法在Cygwin中安装numpy，即使安装了python-devel。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。