英文:
How to find how the values have changed from the begining of their period?
问题
你可以尝试以下代码来实现你的目标:
# 转换日期列为日期时间类型
df['date'] = pd.to_datetime(df['date'])
# 找到每个周期的开始日期(每年的8月1日)
cycle_start_dates = df[df.index.month == 8]
# 使用reindex方法将周期开始日期应用到整个数据框
df_reindexed = df.reindex(cycle_start_dates.index, method='ffill')
# 计算每个月与其对应的周期开始日期的差异
df_difference = df - df_reindexed
# 打印结果
print(df_difference)
这段代码将计算每个月与其对应的周期开始日期的差异,以实现你想要的目标。
英文:
I have a dataframe with timeindex. There is a period (cycle) that starts every august. I want to calculate the difference between the value of each month and the value that was on the previous august (the beginning of its period).
The aim is to know how much the values have changed during each cycle.
This is an small example of the data:
import pandas as pd
import numpy as np
df = pd.DataFrame({'date': ['2022-01-01 00:00:00', '2022-02-01 00:00:00',
'2021-03-01 00:00:00', '2021-04-01 00:00:00',
'2021-05-01 00:00:00', '2021-06-01 00:00:00',
'2021-07-01 00:00:00', '2021-08-01 00:00:00',
'2021-09-01 00:00:00', '2021-10-01 00:00:00',
'2021-11-01 00:00:00', '2021-12-01 00:00:00',
'2022-01-01 00:00:00', '2022-02-01 00:00:00',
'2022-03-01 00:00:00', '2022-04-01 00:00:00',
'2022-05-01 00:00:00', '2022-06-01 00:00:00',
'2022-07-01 00:00:00', '2022-08-01 00:00:00',
'2022-09-01 00:00:00', '2022-10-01 00:00:00',
'2022-11-01 00:00:00', '2023-12-01 00:00:00',
'2023-01-01 00:00:00', '2023-02-01 00:00:00',
'2023-03-01 00:00:00', '2023-04-01 00:00:00',
'2023-05-01 00:00:00', '2023-06-01 00:00:00',
],
'value1': [0.452762281,0.372262281,0.513928948,0.447762281,
0.377095615,0.355095615,0.271428948,0.291762281,
0.476762281,0.335928948,0.280428948,0.283762281,
0.322928948,0.287262281,0.316928948,0.209262281,
0.407928948,0.254262281,0.232095615,0.264262281,
0.076095615,-0.025237719,-0.042237719,-0.094904385,
0.017428948,-0.036071052,-0.094071052,-0.071404385,
0.008095615,-0.141571052],
'value2': [9.6,8,8.4,6.2,6.2,6,3.9,8.5,8.3,5.3,5.6,5.3,
6.2,6.3,6.9,4.8,6.7,3.6,3,4.6,2.3,1.3,1,0.3,
1.6,0.4,1.5,1.4,2.2,1.2]})
df = df.set_index('date')
I have tried to get a dataframe with the same structure but all the values have the value that was on its corresponding august, and then I just need to subtract one dataframe from the other.
But I couldn't find a way to do it neither.
Thank you vary much in advance.
答案1
得分: 0
如果我理解您的意思正确的话,您可以使用.groupby()
然后计算每个组的差异:
df.index = pd.to_datetime(df.index)
tmp = (df.index.month == 8).cumsum()
out = (
df[tmp != 0]
.groupby(tmp[tmp > 0], group_keys=False)
.apply(lambda x: x[["value1", "value2"]] - x[["value1", "value2"]].iloc[0])
).add_prefix('calculated_')
x = df.join(out)
print(x)
打印结果:
value1 value2 calculated_value1 calculated_value2
date
2021-03-01 0.513929 8.4 NaN NaN
2021-04-01 0.447762 6.2 NaN NaN
2021-05-01 0.377096 6.2 NaN NaN
2021-06-01 0.355096 6.0 NaN NaN
2021-07-01 0.271429 3.9 NaN NaN
2021-08-01 0.291762 8.5 0.000000 0.0
2021-09-01 0.476762 8.3 0.185000 -0.2
2021-10-01 0.335929 5.3 0.044167 -3.2
2021-11-01 0.280429 5.6 -0.011333 -2.9
2021-12-01 0.283762 5.3 -0.008000 -3.2
2022-01-01 0.452762 9.6 0.031167 -2.3
2022-01-01 0.322929 6.2 0.031167 -2.3
2022-02-01 0.372262 8.0 -0.004500 -2.2
2022-02-01 0.287262 6.3 -0.004500 -2.2
2022-03-01 0.316929 6.9 0.025167 -1.6
2022-04-01 0.209262 4.8 -0.082500 -3.7
2022-05-01 0.407929 6.7 0.116167 -1.8
2022-06-01 0.254262 3.6 -0.037500 -4.9
2022-07-01 0.232096 3.0 -0.059667 -5.5
2022-08-01 0.264262 4.6 0.000000 0.0
2022-09-01 0.076096 2.3 -0.188167 -2.3
2022-10-01 -0.025238 1.3 -0.289500 -3.3
2022-11-01 -0.042238 1.0 -0.306500 -3.6
2023-01-01 0.017429 1.6 -0.246833 -3.0
2023-02-01 -0.036071 0.4 -0.300333 -4.2
2023-03-01 -0.094071 1.5 -0.358333 -3.1
2023-04-01 -0.071404 1.4 -0.335667 -3.2
2023-05-01 0.008096 2.2 -0.256167 -2.4
2023-06-01 -0.141571 1.2 -0.405833 -3.4
2023-12-01 -0.094904 0.3 -0.359167 -4.3
英文:
If I understand you correctly, you can use .groupby()
and then calculate the difference in each group:
df.index = pd.to_datetime(df.index)
tmp = (df.index.month == 8).cumsum()
out = (
df[tmp != 0]
.groupby(tmp[tmp > 0], group_keys=False)
.apply(lambda x: x[["value1", "value2"]] - x[["value1", "value2"]].iloc[0])
).add_prefix('calculated_')
x = df.join(out)
print(x)
Prints:
value1 value2 calculated_value1 calculated_value2
date
2021-03-01 0.513929 8.4 NaN NaN
2021-04-01 0.447762 6.2 NaN NaN
2021-05-01 0.377096 6.2 NaN NaN
2021-06-01 0.355096 6.0 NaN NaN
2021-07-01 0.271429 3.9 NaN NaN
2021-08-01 0.291762 8.5 0.000000 0.0
2021-09-01 0.476762 8.3 0.185000 -0.2
2021-10-01 0.335929 5.3 0.044167 -3.2
2021-11-01 0.280429 5.6 -0.011333 -2.9
2021-12-01 0.283762 5.3 -0.008000 -3.2
2022-01-01 0.452762 9.6 0.031167 -2.3
2022-01-01 0.322929 6.2 0.031167 -2.3
2022-02-01 0.372262 8.0 -0.004500 -2.2
2022-02-01 0.287262 6.3 -0.004500 -2.2
2022-03-01 0.316929 6.9 0.025167 -1.6
2022-04-01 0.209262 4.8 -0.082500 -3.7
2022-05-01 0.407929 6.7 0.116167 -1.8
2022-06-01 0.254262 3.6 -0.037500 -4.9
2022-07-01 0.232096 3.0 -0.059667 -5.5
2022-08-01 0.264262 4.6 0.000000 0.0
2022-09-01 0.076096 2.3 -0.188167 -2.3
2022-10-01 -0.025238 1.3 -0.289500 -3.3
2022-11-01 -0.042238 1.0 -0.306500 -3.6
2023-01-01 0.017429 1.6 -0.246833 -3.0
2023-02-01 -0.036071 0.4 -0.300333 -4.2
2023-03-01 -0.094071 1.5 -0.358333 -3.1
2023-04-01 -0.071404 1.4 -0.335667 -3.2
2023-05-01 0.008096 2.2 -0.256167 -2.4
2023-06-01 -0.141571 1.2 -0.405833 -3.4
2023-12-01 -0.094904 0.3 -0.359167 -4.3
答案2
得分: 0
以下是您要翻译的内容:
There are several ways to solve your issues. The solution depends on how your data might looks like
There are some flaws in your code too. Here is my suggestion
- You don't need to
set_index('date')
here.
Here is:
df['date'] = pd.to_datetime(df['date'])
df.sort_values(by=['date'], ignore_index=True, inplace=True)
The first line is to convert your date
column to datetime format, and the second is to sort your data in ascending order.
- The most common way is to extract subsample with august only then merge those value back to your main dataframe and perform the calculation. This might be long, but it will shows any potential error if there is in your data
The complete solution:
import pandas as pd
import numpy as np
from dateutil.relativedelta import *
from pandas.tseries.offsets import *
df = pd.DataFrame({'date': ['2022-01-01 00:00:00', '2022-02-01 00:00:00',
'2021-03-01 00:00:00', '2021-04-01 00:00:00',
'2021-05-01 00:00:00', '2021-06-01 00:00:00',
'2021-07-01 00:00:00', '2021-08-01 00:00:00',
'2021-09-01 00:00:00', '2021-10-01 00:00:00',
'2021-11-01 00:00:00', '2021-12-01 00:00:00',
'2022-01-01 00:00:00', '2022-02-01 00:00:00',
'2022-03-01 00:00:00', '2022-04-01 00:00:00',
'2022-05-01 00:00:00', '2022-06-01 00:00:00',
'2022-07-01 00:00:00', '2022-08-01 00:00:00',
'2022-09-01 00:00:00', '2022-10-01 00:00:00',
'2022-11-01 00:00:00', '2023-12-01 00:00:00',
'2023-01-01 00:00:00', '2023-02-01 00:00:00',
'2023-03-01 00:00:00', '2023-04-01 00:00:00',
'2023-05-01 00:00:00', '2023-06-01 00:00:00', ],
'value1': [0.452762281,0.372262281,0.513928948,0.447762281,
0.377095615,0.355095615,0.271428948,0.291762281,
0.476762281,0.335928948,0.280428948,0.283762281,
0.322928948,0.287262281,0.316928948,0.209262281,
0.407928948,0.254262281,0.232095615,0.264262281,
0.076095615,-0.025237719,-0.042237719,-0.094904385,
0.017428948,-0.036071052,-0.094071052,-0.071404385,
0.008095615,-0.141571052],
'value2': [9.6,8,8.4,6.2,6.2,6,3.9,8.5,8.3,5.3,5.6,5.3,
6.2,6.3,6.9,4.8,6.7,3.6,3,4.6,2.3,1.3,1,0.3,
1.6,0.4,1.5,1.4,2.2,1.2]})
# cleaning data
df['date'] = pd.to_datetime(df['date'])
df.sort_values(by=['date'], ignore_index=True, inplace=True)
# creat sub dataframe with august value only
aug = df.loc[df['date'].dt.month==8].copy()
aug.rename(columns={'value1': 'augval1', 'value2': 'augval2', 'date':'date_aug'}, inplace=True)
# create equivalent august date for merge
df['date_aug'] = np.where(df['date'].dt.month < 8, df['date'] + YearBegin(-1) + MonthBegin(7), df['date'] + YearBegin(1) + MonthBegin(7))
# merge aug dataframe to df dataframe
new = pd.merge(df, aug, on='date_aug', how='left')
# perform calculation
new['chg_val1'] = new['value1'].diff()
new['chg_aug'] = new['value1'] - new['augval1']
希望这有助于您的项目!
英文:
There are several ways to solve your issues. The solution depends on how your data might looks like
There are some flaws in your code too. Here is my suggestion
- You don't need to
set_index('date')
here.
Here is:
df['date'] = pd.to_datetime(df['date'])
df.sort_values(by=['date'], ignore_index=True, inplace=True)
The first line is to convert your date
column to datetime format, and the second is to sort your data in ascending order.
- The most common way is to extract subsample with august only then merge those value back to your main dataframe and perform the calculation. This might be long, but it will shows any potential error if there is in your data
The complete solution:
import pandas as pd
import numpy as np
from dateutil.relativedelta import *
from pandas.tseries.offsets import *
df = pd.DataFrame({'date': ['2022-01-01 00:00:00', '2022-02-01 00:00:00',
'2021-03-01 00:00:00', '2021-04-01 00:00:00',
'2021-05-01 00:00:00', '2021-06-01 00:00:00',
'2021-07-01 00:00:00', '2021-08-01 00:00:00',
'2021-09-01 00:00:00', '2021-10-01 00:00:00',
'2021-11-01 00:00:00', '2021-12-01 00:00:00',
'2022-01-01 00:00:00', '2022-02-01 00:00:00',
'2022-03-01 00:00:00', '2022-04-01 00:00:00',
'2022-05-01 00:00:00', '2022-06-01 00:00:00',
'2022-07-01 00:00:00', '2022-08-01 00:00:00',
'2022-09-01 00:00:00', '2022-10-01 00:00:00',
'2022-11-01 00:00:00', '2023-12-01 00:00:00',
'2023-01-01 00:00:00', '2023-02-01 00:00:00',
'2023-03-01 00:00:00', '2023-04-01 00:00:00',
'2023-05-01 00:00:00', '2023-06-01 00:00:00',
],
'value1': [0.452762281,0.372262281,0.513928948,0.447762281,
0.377095615,0.355095615,0.271428948,0.291762281,
0.476762281,0.335928948,0.280428948,0.283762281,
0.322928948,0.287262281,0.316928948,0.209262281,
0.407928948,0.254262281,0.232095615,0.264262281,
0.076095615,-0.025237719,-0.042237719,-0.094904385,
0.017428948,-0.036071052,-0.094071052,-0.071404385,
0.008095615,-0.141571052],
'value2': [9.6,8,8.4,6.2,6.2,6,3.9,8.5,8.3,5.3,5.6,5.3,
6.2,6.3,6.9,4.8,6.7,3.6,3,4.6,2.3,1.3,1,0.3,
1.6,0.4,1.5,1.4,2.2,1.2]})
# cleaning data
df['date'] = pd.to_datetime(df['date'])
df.sort_values(by=['date'], ignore_index=True, inplace=True)
# creat sub dataframe with august value only
aug = df.loc[df['date'].dt.month==8].copy()
aug.rename(columns={'value1': 'augval1', 'value2': 'augval2', 'date':'date_aug'}, inplace=True)
# create equivalent august date for merge
df['date_aug'] = np.where(df['date'].dt.month < 8, df['date'] + YearBegin(-1) + MonthBegin(7), df['date'] + YearBegin(1) + MonthBegin(7))
# merge aug dataframe to df dataframe
new = pd.merge(df, aug, on='date_aug', how='left')
# perform calculation
new['chg_val1'] = new['value1'].diff()
new['chg_aug'] = new['value1'] - new['augval1']
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论