英文:
Get rolling average of a Pandas DataFrame with hourly values, while taking into account cyclical nature of days
问题
以下是代码的翻译部分:
让我们假设我有一个带有多级索引的数据框,构造如下:
import numpy as np
import pandas as pd
ids = ['a', 'b', 'c']
hours = np.arange(24)
data = np.random.random((len(ids), len(hours)))
df = pd.concat([pd.DataFrame(index=[[id] * len(hours), hours], data={'value': data[ind]}) for ind, id in enumerate(ids)])
df.index.names = ['ID', 'hour']
这是代码的翻译部分,如有需要,请继续提出问题。
英文:
Lets say I have a dataframe with a multiindex, constructed as follows:
import numpy as np
import pandas as pd
ids = ['a', 'b', 'c']
hours = np.arange(24)
data = np.random.random((len(ids),len(hours)))
df = pd.concat([pd.DataFrame(index = [[id]*len(hours), hours], data = {'value':data[ind]}) for ind, id in enumerate(ids)])
df.index.names = ['ID', 'hour']
Which looks like this:
value
ID hour
a 0 0.020479
1 0.059987
2 0.053100
3 0.406198
4 0.452231
...
c 19 0.150493
20 0.617098
21 0.377062
22 0.196807
23 0.954401
What I want to do is get a new 24-hour timeseries for each station, but calculated with a 5-hour rolling average.
I know I can do something like df.rolling(5, center = True, on = 'hour')
, but the problem with this is that it doesn't take into account the fact that the hours are cyclical - i.e., the rolling average for hour 0 should be the average of hours 22, 23, 0, 1, and 2.
What is a good way to do this?
Thanks!
答案1
得分: 2
如果您想考虑循环,可以使用 np.pad
和 np.convolve
:
import pandas as pd
import numpy as np
# 更全面的示例
mi = pd.MultiIndex.from_product([['a'], np.arange(1, 25)], names=['ID', 'hour'])
df = pd.DataFrame({'value': np.arange(1, 25)}, index=mi)
def cycling_ma(x):
return np.convolve(np.pad(x, 2, mode='wrap'), np.ones(5)/5, mode='valid')
df['ma'] = df.groupby('ID')['value'].transform(cycling_ma)
输出:
>>> df
value ma
ID hour
a 1 1 10.6 # (23 + 24 + 1 + 2 + 3) / 5 (23 和 24 -> 从末尾填充)
2 2 6.8
3 3 3.0
4 4 4.0
5 5 5.0
6 6 6.0
7 7 7.0
8 8 8.0
9 9 9.0
10 10 10.0
11 11 11.0
12 12 12.0
13 13 13.0
14 14 14.0
15 15 15.0
16 16 16.0
17 17 17.0
18 18 18.0
19 19 19.0
20 20 20.0
21 21 21.0
22 22 22.0
23 23 18.2
24 24 14.4 # (22 + 23 + 24 + 1 + 2) / 5 (1 和 2 -> 从开头填充)
参考:How to calculate rolling / moving average using python + NumPy / SciPy?
英文:
If you want to take into account the cycle, use np.pad
and np.convolve
:
import pandas as pd
import numpy as np
# A more comprehensive example
mi = pd.MultiIndex.from_product([['a'], np.arange(1, 25)], names=['ID', 'hour'])
df = pd.DataFrame({'value': np.arange(1, 25)}, index=mi)
def cycling_ma(x):
return np.convolve(np.pad(x, 2, mode='wrap'), np.ones(5)/5, mode='valid')
df['ma'] = df.groupby('ID')['value'].transform(cycling_ma)
Output:
>>> df
value ma
ID hour
a 1 1 10.6 # (23 + 24 + 1 + 2 + 3) / 5 (23 and 24 -> pad from end)
2 2 6.8
3 3 3.0
4 4 4.0
5 5 5.0
6 6 6.0
7 7 7.0
8 8 8.0
9 9 9.0
10 10 10.0
11 11 11.0
12 12 12.0
13 13 13.0
14 14 14.0
15 15 15.0
16 16 16.0
17 17 17.0
18 18 18.0
19 19 19.0
20 20 20.0
21 21 21.0
22 22 22.0
23 23 18.2
24 24 14.4 # (22 + 23 + 24 + 1 + 2) / 5 (1 and 2 -> pad from begin)
Reference: How to calculate rolling / moving average using python + NumPy / SciPy?
答案2
得分: 0
以下是使用np.roll()
的方法:
df.join(
df.groupby('ID', group_keys=False).apply(
lambda x: pd.DataFrame([np.roll(x['value'], 2 - i)[:5].mean() for i in range(x.shape[0])],
index=x.index,
columns=['ma'])))
输出:
value ma
ID hour
a 1 1 10.6
2 2 6.8
3 3 3.0
4 4 4.0
5 5 5.0
6 6 6.0
7 7 7.0
8 8 8.0
9 9 9.0
10 10 10.0
11 11 11.0
12 12 12.0
13 13 13.0
14 14 14.0
15 15 15.0
16 16 16.0
17 17 17.0
18 18 18.0
19 19 19.0
20 20 20.0
21 21 21.0
22 22 22.0
23 23 18.2
24 24 14.4
英文:
Here is a way using np.roll()
df.join(
df.groupby('ID',group_keys=False).apply(
lambda x: pd.DataFrame([np.roll(x['value'],2 - i)[:5].mean() for i in range(x.shape[0])],
index = x.index,
columns = ['ma'])))
Output:
value ma
ID hour
a 1 1 10.6
2 2 6.8
3 3 3.0
4 4 4.0
5 5 5.0
6 6 6.0
7 7 7.0
8 8 8.0
9 9 9.0
10 10 10.0
11 11 11.0
12 12 12.0
13 13 13.0
14 14 14.0
15 15 15.0
16 16 16.0
17 17 17.0
18 18 18.0
19 19 19.0
20 20 20.0
21 21 21.0
22 22 22.0
23 23 18.2
24 24 14.4
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论