2023年6月1日 01:29:45go评论135阅读模式

英文:

Get rolling average of a Pandas DataFrame with hourly values, while taking into account cyclical nature of days

问题

以下是代码的翻译部分：

让我们假设我有一个带有多级索引的数据框，构造如下：

import numpy as np
import pandas as pd

ids = ['a', 'b', 'c']
hours = np.arange(24)
data = np.random.random((len(ids), len(hours)))

df = pd.concat([pd.DataFrame(index=[[id] * len(hours), hours], data={'value': data[ind]}) for ind, id in enumerate(ids)])
df.index.names = ['ID', 'hour']

这是代码的翻译部分，如有需要，请继续提出问题。

英文:

Lets say I have a dataframe with a multiindex, constructed as follows:

import numpy as np
import pandas as pd

ids = [&#39;a&#39;, &#39;b&#39;, &#39;c&#39;]
hours = np.arange(24)
data = np.random.random((len(ids),len(hours)))

df = pd.concat([pd.DataFrame(index = [[id]*len(hours), hours], data = {&#39;value&#39;:data[ind]}) for ind, id in enumerate(ids)])
df.index.names = [&#39;ID&#39;, &#39;hour&#39;]

Which looks like this:

            value
ID hour          
a  0     0.020479
   1     0.059987
   2     0.053100
   3     0.406198
   4     0.452231
          ...
c  19    0.150493
   20    0.617098
   21    0.377062
   22    0.196807
   23    0.954401

What I want to do is get a new 24-hour timeseries for each station, but calculated with a 5-hour rolling average.

I know I can do something like df.rolling(5, center = True, on = 'hour'), but the problem with this is that it doesn't take into account the fact that the hours are cyclical - i.e., the rolling average for hour 0 should be the average of hours 22, 23, 0, 1, and 2.

What is a good way to do this?

Thanks!

答案1

得分: 2

如果您想考虑循环，可以使用 np.pad 和 np.convolve：

import pandas as pd
import numpy as np

# 更全面的示例
mi = pd.MultiIndex.from_product([['a'], np.arange(1, 25)], names=['ID', 'hour'])
df = pd.DataFrame({'value': np.arange(1, 25)}, index=mi)

def cycling_ma(x):
    return np.convolve(np.pad(x, 2, mode='wrap'), np.ones(5)/5, mode='valid')

df['ma'] = df.groupby('ID')['value'].transform(cycling_ma)

输出：

>>> df
         value    ma
ID hour
a  1         1  10.6  # (23 + 24 + 1 + 2 + 3) / 5 (23 和 24 -> 从末尾填充)
   2         2   6.8
   3         3   3.0
   4         4   4.0
   5         5   5.0
   6         6   6.0
   7         7   7.0
   8         8   8.0
   9         9   9.0
   10       10  10.0
   11       11  11.0
   12       12  12.0
   13       13  13.0
   14       14  14.0
   15       15  15.0
   16       16  16.0
   17       17  17.0
   18       18  18.0
   19       19  19.0
   20       20  20.0
   21       21  21.0
   22       22  22.0
   23       23  18.2
   24       24  14.4  # (22 + 23 + 24 + 1 + 2) / 5 (1 和 2 -> 从开头填充)

参考：How to calculate rolling / moving average using python + NumPy / SciPy?

英文:

If you want to take into account the cycle, use np.pad and np.convolve:

import pandas as pd
import numpy as np

# A more comprehensive example
mi = pd.MultiIndex.from_product([[&#39;a&#39;], np.arange(1, 25)], names=[&#39;ID&#39;, &#39;hour&#39;])
df = pd.DataFrame({&#39;value&#39;: np.arange(1, 25)}, index=mi)

def cycling_ma(x):
    return np.convolve(np.pad(x, 2, mode=&#39;wrap&#39;), np.ones(5)/5, mode=&#39;valid&#39;)

df[&#39;ma&#39;] = df.groupby(&#39;ID&#39;)[&#39;value&#39;].transform(cycling_ma)

Output:

&gt;&gt;&gt; df
         value    ma
ID hour
a  1         1  10.6  # (23 + 24 + 1 + 2 + 3) / 5 (23 and 24 -&gt; pad from end)
   2         2   6.8
   3         3   3.0
   4         4   4.0
   5         5   5.0
   6         6   6.0
   7         7   7.0
   8         8   8.0
   9         9   9.0
   10       10  10.0
   11       11  11.0
   12       12  12.0
   13       13  13.0
   14       14  14.0
   15       15  15.0
   16       16  16.0
   17       17  17.0
   18       18  18.0
   19       19  19.0
   20       20  20.0
   21       21  21.0
   22       22  22.0
   23       23  18.2
   24       24  14.4  # (22 + 23 + 24 + 1 + 2) / 5 (1 and 2 -&gt; pad from begin)

Reference: How to calculate rolling / moving average using python + NumPy / SciPy?

答案2

得分: 0

以下是使用np.roll()的方法：

df.join(
    df.groupby('ID', group_keys=False).apply(
        lambda x: pd.DataFrame([np.roll(x['value'], 2 - i)[:5].mean() for i in range(x.shape[0])],
                               index=x.index,
                               columns=['ma'])))

输出：

             value    ma
ID hour             
a  1         1  10.6
   2         2   6.8
   3         3   3.0
   4         4   4.0
   5         5   5.0
   6         6   6.0
   7         7   7.0
   8         8   8.0
   9         9   9.0
   10       10  10.0
   11       11  11.0
   12       12  12.0
   13       13  13.0
   14       14  14.0
   15       15  15.0
   16       16  16.0
   17       17  17.0
   18       18  18.0
   19       19  19.0
   20       20  20.0
   21       21  21.0
   22       22  22.0
   23       23  18.2
   24       24  14.4

英文:

Here is a way using np.roll()

df.join(
    df.groupby(&#39;ID&#39;,group_keys=False).apply(
        lambda x: pd.DataFrame([np.roll(x[&#39;value&#39;],2 - i)[:5].mean() for i in range(x.shape[0])],
                               index = x.index,
                               columns = [&#39;ma&#39;])))

Output:

         value    ma
ID hour             
a  1         1  10.6
   2         2   6.8
   3         3   3.0
   4         4   4.0
   5         5   5.0
   6         6   6.0
   7         7   7.0
   8         8   8.0
   9         9   9.0
   10       10  10.0
   11       11  11.0
   12       12  12.0
   13       13  13.0
   14       14  14.0
   15       15  15.0
   16       16  16.0
   17       17  17.0
   18       18  18.0
   19       19  19.0
   20       20  20.0
   21       21  21.0
   22       22  22.0
   23       23  18.2
   24       24  14.4

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

获取 Pandas DataFrame 中每小时值的滚动平均值，同时考虑到一天的循环性质。

问题

答案1

答案2

“django-STATICFILES_DIRS not collecting” would be: “django的STATICFILES_DIRS未收集到”

正则表达式匹配特定字符串后跟6位数字。

Python：无法将所有链接保存到JSON文件中的字典中，只有最后一个。

移除Python中列表内元组的括号

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论