2023年8月9日 18:17:12go评论157阅读模式

英文:

How to calculate the Month to Date(MTD) and Year to Date(YTD) in a dataframe?

问题

我想计算给定数据框的MTD和YTD。我有两列名为bp和actuals。

bp的MTD和YTD值是累积平均值。但是actuals的MTD和YTD值是在不考虑零值的情况下的累积平均值。

例如，如果2023年4月1日的actuals值为10，则其对应的mtd_actuals列值将为10。2023年4月2日的actuals值为20，则其对应的mtd_actuals列值将为15。2023年4月3日的actuals值为0，则其对应的mtd_actuals列值将为15。

输入数据框：

data = {
    'date': ['2023-04-01', '2023-04-02', '2023-04-03', '2023-04-04', '2023-04-05'],
    'category': ['QUALITY PARAMETERS', 'QUALITY PARAMETERS', 'QUALITY PARAMETERS', 'QUALITY PARAMETERS', 'QUALITY PARAMETERS'],
    'parameter': ['TBR', 'TBR', 'TBR', 'TBR', 'TBR'],
    'region': ['DELHI', 'DELHI', 'DELHI', 'DELHI', 'DELHI'],
    'factory': ['ABC', 'ABC', 'ABC', 'ABC', 'ABC'],
    'actuals': [10, 20, 0, 0, 10]
}

输出数据框：

date            category        parameter ...   actuals mtd_bp mtd_actuals ytd_bp ytd_actuals
2023-04-01  QUALITY PARAMETERS       TBR  ...      10     51   10.000000   51   10.000000 
2023-04-02  QUALITY PARAMETERS       TBR  ...      20     51   15.000000   51   15.000000
2023-04-03  QUALITY PARAMETERS       TBR  ...       0     51   15.000000   51   15.000000
2023-04-04  QUALITY PARAMETERS       TBR  ...       0     51   15.000000   51   15.000000
2023-04-05  QUALITY PARAMETERS       TBR  ...      10     51   13.333330   51   13.333330

我已经使用以下代码来计算mtd_actuals和ytd_actuals：

df['mtd_bp'] = df.groupby([df.date.dt.to_period('m'), 'category', 'parameter', 'region', 'factory']).actuals.expanding().mean().droplevel([0, 1, 2, 3, 4])
df['mtd_actuals'] = df.groupby([df.date.dt.to_period('m'), 'category', 'parameter', 'region', 'factory']).actuals.expanding().mean().droplevel([0, 1, 2, 3, 4])
df['ytd_bp'] = df.groupby([df.date.dt.to_period('A-MAR'), 'category', 'parameter', 'region', 'factory']).actuals.expanding().mean().droplevel([0, 1, 2, 3, 4])
df['ytd_actuals'] = df.groupby([df.date.dt.to_period('A-MAR'), 'category', 'parameter', 'region', 'factory']).actuals.expanding().mean().droplevel([0, 1, 2, 3, 4])

在这里，mtd_bp和ytd_bp的值是正确的。但是mtd_actuals和ytd_actuals的值不正确。

有人可以提供解决方案或修改mtd_actuals和ytd_actuals的代码以获得正确的输出吗？

英文:

I want to calculate the MTD and YTD for the given dataframe.I have two columns named bp and actuals.

MTD and YTD values for bp is the cumulative average.But the MTD and YTD values for actuals is the cumulative average without taking the zero values.

For example, if actuals value for 2023-04-01 is 10, then its corresponding mtd_actuals column value will be 10.actuals value for 2023-04-02 is 20, then its corresponding mtd_actuals column value will be 15. actuals value for 2023-04-03 is 0, then its corresponding mtd_actuals column value will be 15.

Input dataframe :

data = {
&#39;date&#39;: [&#39;2023-04-01&#39;, &#39;2023-04-02&#39;, &#39;2023-04-03&#39;, &#39;2023-04-04&#39;, &#39;2023-04-05&#39;],
&#39;category&#39;: [&#39;QUALITY PARAMETERS&#39;, &#39;QUALITY PARAMETERS&#39;, &#39;QUALITY PARAMETERS&#39;, &#39;QUALITY PARAMETERS&#39;, &#39;QUALITY PARAMETERS&#39;],
&#39;parameter&#39;: [&#39;TBR&#39;, &#39;TBR&#39;, &#39;TBR&#39;, &#39;TBR&#39;, &#39;TBR&#39;],
&#39;region&#39;: [&#39;DELHI&#39;, &#39;DELHI&#39;, &#39;DELHI&#39;, &#39;DELHI&#39;, &#39;DELHI&#39;],
&#39;factory&#39;: [&#39;ABC&#39;, &#39;ABC&#39;, &#39;ABC&#39;, &#39;ABC&#39;, &#39;ABC&#39;],
&#39;actuals&#39;: [10, 20, 0, 0, 10]

}

output dataframe:

date            category        parameter ...   actuals mtd_bp mtd_actuals ytd_bp ytd_actuals
2023-04-01  QUALITY PARAMETERS       TBR  ...      10     51   10.000000   51   10.000000 
2023-04-02  QUALITY PARAMETERS       TBR  ...      20     51   15.000000   51   15.000000
2023-04-03  QUALITY PARAMETERS       TBR  ...       0     51   15.000000   51   15.000000
2023-04-04  QUALITY PARAMETERS       TBR  ...       0     51   15.000000   51   15.000000
2023-04-05  QUALITY PARAMETERS       TBR  ...      10     51   13.333330   51   13.333330

I have used below code for calculating the mtd_actuals and ytd_actuals:

df[&#39;mtd_bp&#39;] = df.groupby([df.date.dt.to_period(&#39;m&#39;), &#39;category&#39;, &#39;parameter&#39;, &#39;region&#39;, &#39;factory&#39;]).actuals.expanding().mean().droplevel([0, 1, 2, 3, 4])
df[&#39;mtd_actuals&#39;] = df.groupby([df.date.dt.to_period(&#39;m&#39;), &#39;category&#39;, &#39;parameter&#39;, &#39;region&#39;, &#39;factory&#39;]).actuals.expanding().mean().droplevel([0, 1, 2, 3, 4])
df[&#39;ytd_bp&#39;] = df.groupby([df.date.dt.to_period(&#39;A-MAR&#39;), &#39;category&#39;, &#39;parameter&#39;, &#39;region&#39;, &#39;factory&#39;]).actuals.expanding().mean().droplevel([0, 1, 2, 3, 4])
df[&#39;ytd_actuals&#39;] = df.groupby([df.date.dt.to_period(&#39;A-MAR&#39;), &#39;category&#39;, &#39;parameter&#39;, &#39;region&#39;, &#39;factory&#39;]).actuals.expanding().mean().droplevel([0, 1, 2, 3, 4])

Here mtd_bp and ytd_bp values comes correctly.But mtd_actuals and ytd_actuals are not correct.

Can anyone suggest a solution or modify the mtd_actuals and ytd_actuals code to get the correct output?

答案1

得分: 1

示例

import pandas as pd
data1 = {'date': ['2023-04-01', '2023-04-02', '2023-04-03', '2023-04-04', '2023-04-05', '2023-05-01', '2023-05-02', '2023-05-03'], 
         'bp': [51, 51, 51, 51, 51, 51, 51, 51], 'actuals': [10, 20, 0, 0, 10, 20, 10, 0]}
df = pd.DataFrame(data1)

df

       date  bp  actuals
0  2023-04-01  51       10
1  2023-04-02  51       20
2  2023-04-03  51        0
3  2023-04-04  51        0
4  2023-04-05  51       10
5  2023-05-01  51       20
6  2023-05-02  51       10
7  2023-05-03  51        0

步骤1

创建用于分组的时间段grouper

grouper = pd.to_datetime(df['date']).dt.to_period(freq='M')

grouper

0    2023-04
1    2023-04
2    2023-04
3    2023-04
4    2023-04
5    2023-05
6    2023-05
7    2023-05
Name: date, dtype: period[M]

步骤2

使用掩码和按grouper分组并展开

df['actuals'].mask(df['actuals'].eq(0)).groupby(grouper).expanding().mean().droplevel(0)

输出:

0    10.000000
1    15.000000
2    15.000000
3    15.000000
4    13.333333
5    20.000000
6    15.000000
7    15.000000

让我们比较df和输出。

         df           输出
       date  bp  actuals           
0  2023-04-01  51       10   10.000000
1  2023-04-02  51       20   15.000000
2  2023-04-03  51        0   15.000000
3  2023-04-04  51        0   15.000000
4  2023-04-05  51       10   13.333333
5  2023-05-01  51       20   20.000000
6  2023-05-02  51       10   15.000000
7  2023-05-03  51        0   15.000000

我想要您创建您希望作为输出的列。

英文:

Your example was not ideal for showing the expansion by month, excluding 0, so I created a new one.

Example

import pandas as pd
data1 = {&#39;date&#39;: [&#39;2023-04-01&#39;, &#39;2023-04-02&#39;, &#39;2023-04-03&#39;, &#39;2023-04-04&#39;, &#39;2023-04-05&#39;, &#39;2023-05-01&#39;, &#39;2023-05-02&#39;, &#39;2023-05-03&#39;], 
         &#39;bp&#39;: [51, 51, 51, 51, 51, 51, 51, 51], &#39;actuals&#39;: [10, 20, 0, 0, 10, 20, 10, 0]}
df = pd.DataFrame(data1)

df

    date	    bp	actuals
0	2023-04-01	51	10
1	2023-04-02	51	20
2	2023-04-03	51	0
3	2023-04-04	51	0
4	2023-04-05	51	10
5	2023-05-01	51	20
6	2023-05-02	51	10
7	2023-05-03	51	0

Step1

make period grouper for groupby

grouper = pd.to_datetime(df[&#39;date&#39;]).dt.to_period(freq=&#39;M&#39;)

grouper

0    2023-04
1    2023-04
2    2023-04
3    2023-04
4    2023-04
5    2023-05
6    2023-05
7    2023-05
Name: date, dtype: period[M]

Step2

mask & groupby by grouper & expanding

df[&#39;actuals&#39;].mask(df[&#39;actuals&#39;].eq(0)).groupby(grouper).expanding().mean().droplevel(0)

output:

0    10.000000
1    15.000000
2    15.000000
3    15.000000
4    13.333333
5    20.000000
6    15.000000
7    15.000000

Let's compare df and output.

         df                         output
    date	    bp	actuals
0	2023-04-01	51	10	            10.000000
1	2023-04-02	51	20	            15.000000
2	2023-04-03	51	0	            15.000000
3	2023-04-04	51	0	            15.000000
4	2023-04-05	51	10	            13.333333
5	2023-05-01	51	20	            20.000000
6	2023-05-02	51	10	            15.000000
7	2023-05-03	51	0	            15.000000

I want you to create the columns you want as output.

答案2

得分: 0

以下是翻译好的代码部分：

另一种选择：
import pandas as pd
import numpy as np
data = {
'date': ['2023-04-01', '2023-04-02', '2023-04-03', '2023-04-04', '2023-04-05'],
'category': ['QUALITY PARAMETERS', 'QUALITY PARAMETERS', 'QUALITY PARAMETERS', 'QUALITY PARAMETERS', 'QUALITY PARAMETERS'],
'parameter': ['TBR', 'TBR', 'TBR', 'TBR', 'TBR'],
'region': ['DELHI', 'DELHI', 'DELHI', 'DELHI', 'DELHI'],
'factory': ['ABC', 'ABC', 'ABC', 'ABC', 'ABC'],
'actuals': [10, 20, 0, 0, 10]}
# 将数据转换为DataFrame
df = pd.DataFrame(data)
df['actuals'].replace(0, np.nan, inplace=True)
# 计算累积平均值并用前一个值填充NaN值
df['ytd'] = df['actuals'].expanding().mean().ffill()
print(df)

输出示例：

           date           category parameter region factory  actuals        ytd
0  2023-04-01  QUALITY PARAMETERS       TBR  DELHI     ABC     10.0  10.000000
1  2023-04-02  QUALITY PARAMETERS       TBR  DELHI     ABC     20.0  15.000000
2  2023-04-03  QUALITY PARAMETERS       TBR  DELHI     ABC      NaN  15.000000
3  2023-04-04  QUALITY PARAMETERS       TBR  DELHI     ABC      NaN  15.000000
4  2023-04-05  QUALITY PARAMETERS       TBR  DELHI     ABC     10.0  13.333333

英文:

Another option:

import pandas as pd
import numpy as np
data = {
&#39;date&#39;: [&#39;2023-04-01&#39;, &#39;2023-04-02&#39;, &#39;2023-04-03&#39;, &#39;2023-04-04&#39;, &#39;2023-04-05&#39;],
&#39;category&#39;: [&#39;QUALITY PARAMETERS&#39;, &#39;QUALITY PARAMETERS&#39;, &#39;QUALITY PARAMETERS&#39;, &#39;QUALITY PARAMETERS&#39;, &#39;QUALITY PARAMETERS&#39;],
&#39;parameter&#39;: [&#39;TBR&#39;, &#39;TBR&#39;, &#39;TBR&#39;, &#39;TBR&#39;, &#39;TBR&#39;],
&#39;region&#39;: [&#39;DELHI&#39;, &#39;DELHI&#39;, &#39;DELHI&#39;, &#39;DELHI&#39;, &#39;DELHI&#39;],
&#39;factory&#39;: [&#39;ABC&#39;, &#39;ABC&#39;, &#39;ABC&#39;, &#39;ABC&#39;, &#39;ABC&#39;],
&#39;actuals&#39;: [10, 20, 0, 0, 10]}
# Convert data into DataFrame
df = pd.DataFrame(data)
df[&#39;actuals&#39;].replace(0, np.nan, inplace=True)
# Compute the cumulative average and fill NaN values with the previous value
df[&#39;ytd&#39;] = df[&#39;actuals&#39;].expanding().mean().ffill()
print(df)
date            category parameter region factory  actuals        ytd
0  2023-04-01  QUALITY PARAMETERS       TBR  DELHI     ABC     10.0  10.000000
1  2023-04-02  QUALITY PARAMETERS       TBR  DELHI     ABC     20.0  15.000000
2  2023-04-03  QUALITY PARAMETERS       TBR  DELHI     ABC      NaN  15.000000
3  2023-04-04  QUALITY PARAMETERS       TBR  DELHI     ABC      NaN  15.000000
4  2023-04-05  QUALITY PARAMETERS       TBR  DELHI     ABC     10.0  13.333333

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在数据框中计算截止到当月的月度累计(MTD)和年度累计(YTD)？

问题

答案1

答案2

`jax.numpy.delete`假定唯一索引出现意外的关键字参数。

`interp2d`已被弃用！但无法找到替代方法。

如何在由FastAPI生成的API规范中生成Long类型变量？

Kaggle数据清理

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。