计算数据框中相同日期和小时的每两周滚动平均值。

huangapple go评论59阅读模式
英文:

Calculate the rolling average every two weeks for the same day and hour in a DataFrame

问题

我有一个类似以下的数据框:

df = pd.DataFrame()
df['datetime'] = pd.date_range(start='2023-1-2', end='2023-1-29', freq='15min')
df['week'] = df['datetime'].apply(lambda x: int(x.isocalendar()[1]))
df['day_of_week'] = df['datetime'].dt.weekday
df['hour'] = df['datetime'].dt.hour
df['minutes'] = pd.DatetimeIndex(df['datetime']).minute
df['value'] = range(len(df))
df.set_index('datetime', inplace=True)

我想要计算相同小时/分钟/日的"value"列的平均值,每两周连续的一组。我希望得到以下结果:

df=
                               value
day_of_week	hour minutes	datetime	
0	        0	0	2023-01-02 00:00:00	NaN
			2023-01-09 00:00:00	NaN
			2023-01-16 00:00:00	336
			2023-01-23 00:00:00	1008
		15	2023-01-02 00:15:00	NaN
			2023-01-09 00:15:00	NaN
			2023-01-16 00:15:00	337
			2023-01-23 00:15:00	1009

所以前两周应该有NaN值,第三周应该是第一周和第二周的平均值,然后第四周应该是第二周和第三周的平均值,以此类推。我尝试了以下代码,但它似乎不符合我的预期:

df = pd.DataFrame(df.groupby(['day_of_week', 'hour', 'minutes'])['value'].rolling(window='14D', min_periods=1).mean())

因为我得到的结果是:

value
day_of_week	hour minutes	datetime	
0	        0	0	2023-01-02 00:00:00	0
			2023-01-09 00:00:00	336
			2023-01-16 00:00:00	1008
			2023-01-23 00:00:00	1680
		15	2023-01-02 00:15:00	1
			2023-01-09 00:15:00	337
			2023-01-16 00:15:00	1009
			2023-01-23 00:15:00	1681

我认为你可以尝试以下代码来获得你想要的结果:

# 计算每两周的平均值
df['average_value'] = df.groupby(['day_of_week', 'hour', 'minutes'])['value'].rolling(window=14, min_periods=1).mean().reset_index(level=0, drop=True)

# 将结果重塑为你想要的形式
result = df[['average_value']].unstack(0)

# 重新命名列
result.columns = [f'week-{i}' for i in range(1, len(result.columns) + 1)]

# 重置索引
result = result.reset_index()
result = result.rename_axis(None, axis=1)

# 创建目标日期列表
target_dates = pd.date_range(start='2023-01-02', end='2023-01-29', freq='D')

# 将目标日期与结果合并
result['datetime'] = target_dates
result.set_index('datetime', inplace=True)

# 移动结果列以匹配你的期望
result = result[['day_of_week', 'hour', 'minutes'] + [f'week-{i}' for i in range(1, len(result.columns))]]

# 填充NaN值
result = result.fillna(method='ffill')

# 打印结果
print(result)

这应该给你想要的结果。

英文:

I have a Dataframe like the following:

df = pd.DataFrame()
df['datetime'] = pd.date_range(start='2023-1-2', end='2023-1-29', freq='15min')
df['week'] = df['datetime'].apply(lambda x: int(x.isocalendar()[1]))
df['day_of_week'] = df['datetime'].dt.weekday
df['hour'] = df['datetime'].dt.hour
df['minutes'] = pd.DatetimeIndex(df['datetime']).minute
df['value'] = range(len(df))
df.set_index('datetime',inplace=True)


  df =     	       	    week day_of_week hour minutes value
    datetime					
    2023-01-02 00:00:00	1	0	0	0	0
    2023-01-02 00:15:00	1	0	0	15	1
    2023-01-02 00:30:00	1	0	0	30	2
    2023-01-02 00:45:00	1	0	0	45	3
    2023-01-02 01:00:00	1	0	1	0	4
    ...	...	...	...	...	...
    2023-01-08 23:00:00	1	6	23	0	668
    2023-01-08 23:15:00	1	6	23	15	669
    2023-01-08 23:30:00	1	6	23	30	670
    2023-01-08 23:45:00	1	6	23	45	671
    2023-01-09 00:00:00	2	0	0	0	672

And I want to calculate the average of the column "value" for the same hour/minute/day, every two consecutive weeks.

What I would like to get is the following:

df=
    				                                value
    day_of_week	hour minutes	datetime	
              0	   0	   0	2023-01-02 00:00:00	NaN
                                2023-01-09 00:00:00	NaN
                                2023-01-16 00:00:00	336
                                2023-01-23 00:00:00	1008
                           15	2023-01-02 00:15:00	NaN
                                2023-01-09 00:15:00 NaN
                                2023-01-16 00:15:00 337
                                2023-01-23 00:15:00 1009

So the first two weeks should have NaN values and week-3 should be the average of week-1 and week-2 and then week-4 the average of week-2 and week-3 and so on.

I tried the following code but it does not seem to do what I expect:

df = pd.DataFrame(df.groupby(['day_of_week','hour','minutes'])['value'].rolling(window='14D', min_periods=1).mean())

As what I am getting is:

				                                value
day_of_week	hour minutes.  datetime	
0	        0	 0	       2023-01-02 00:00:00	0
                           2023-01-09 00:00:00	336
                           2023-01-16 00:00:00	1008
                           2023-01-23 00:00:00	1680
                 15	       2023-01-02 00:15:00	1
                           2023-01-09 00:15:00	337
                           2023-01-16 00:15:00	1009
                           2023-01-23 00:15:00	1681

答案1

得分: 1

我认为你想要在每个分组内进行位移。然后你需要另一个 groupby:

(df.groupby(['day_of_week', 'hour', 'minutes'])['value']
   .rolling(window='14D', min_periods=2).mean()         # `min_periods` 不同
   .groupby(['day_of_week', 'hour', 'minutes']).shift()   # 在每个分组内进行位移
   .to_frame()
)

输出:

                                              value
day_of_week hour minutes datetime                  
0           0    0       2023-01-02 00:00:00    NaN
                         2023-01-09 00:00:00    NaN
                         2023-01-16 00:00:00  336.0
                         2023-01-23 00:00:00 1008.0
                 15      2023-01-02 00:15:00    NaN
...
6           23   30      2023-01-15 23:30:00    NaN
                         2023-01-22 23:30:00 1006.0
                 45      2023-01-08 23:45:00    NaN
                         2023-01-15 23:45:00    NaN
                         2023-01-22 23:45:00 1007.0
英文:

I think you want to shift within each group. Then you need another groupby:

(df.groupby(['day_of_week','hour','minutes'])['value']
   .rolling(window='14D', min_periods=2).mean()         # `min_periods` is different
   .groupby(['day_of_week','hour','minutes']).shift()   # shift within each group
   .to_frame()
)

Output:

                                               value
day_of_week hour minutes datetime                   
0           0    0       2023-01-02 00:00:00     NaN
                         2023-01-09 00:00:00     NaN
                         2023-01-16 00:00:00   336.0
                         2023-01-23 00:00:00  1008.0
                 15      2023-01-02 00:15:00     NaN
...                                              ...
6           23   30      2023-01-15 23:30:00     NaN
                         2023-01-22 23:30:00  1006.0
                 45      2023-01-08 23:45:00     NaN
                         2023-01-15 23:45:00     NaN
                         2023-01-22 23:45:00  1007.0

huangapple
  • 本文由 发表于 2023年2月10日 02:58:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/75403238.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定