2023年5月13日 18:09:41go评论85阅读模式

英文:

Exclude center value from centered rolling average in Pandas

问题

我想在Pandas中计算居中滚动平均值，其中排除了中心值。

以下代码完美运行，但包括中心值：

df.Value.rolling(window=11, center=True, min_periods=1).mean()

window=11 和 min_periods=1 很重要，因为我想为df中的每个值计算平均值。

例如，

对于第二个值，它应该考虑当前值以下的5个值和当前值上面的1个值。（当前代码包括第二个值）
对于最后一个值，它应该考虑上面的5个值，但排除最后一个值。（当前代码考虑了最后6个值）

以下是预期结果的表格：

值	预期值
13313	27180.6
12792	28897
20254	28770.14286
34915	27468.5
31410	29037
36532	30028.5
36958	34071.1
35471	36600.66667
33018	38950.625
38080	38804
44074	39037.33333
54165	37520.2

并且Excel中的计算方法如下：

Example Table

英文:

I want to calculate the centered rolling average in Pandas, where the center value is excluded.

The code below works perfectly, but it includes the center value:

df.Value.rolling(window=11, center=True, min_periods=1).mean()

The window=11 and min_periods=1 are important, since I want to calculate the average for every value in df.

For example,

for the second value, it should consider 5 values below and 1 that is above the current value. (Current code includes the second value)
for the last value, it should consider 5 values above, but exclude the last one. (Current code considers the last 6 values)

Here is a table with the expected results:

Values	Expected
13313	27180.6
12792	28897
20254	28770.14286
34915	27468.5
31410	29037
36532	30028.5
36958	34071.1
35471	36600.66667
33018	38950.625
38080	38804
44074	39037.33333
54165	37520.2

And calculation method from Excel:

Example Table

答案1

得分: 1

#### 更新的答案
您可以修改我最初回答中的方法，使其具有动态划分：

win = 11
roll = df['Values'].rolling(window=win, center=True, min_periods=1)

df['out'] = roll.sum().sub(df['Values']).div(roll.count()-1)

输出：

Values     Expected           out

0 13313 27180.60000 27180.600000
1 12792 28897.00000 28897.000000
2 20254 28770.14286 28770.142857
3 34915 27468.50000 27468.500000
4 31410 29037.00000 29037.000000
5 36532 30028.50000 30028.500000
6 36958 34071.10000 34071.100000
7 35471 36600.66667 36600.666667
8 33018 38950.62500 38950.625000
9 38080 38804.00000 38804.000000
10 44074 39037.33333 39037.333333
11 54165 37520.20000 37520.200000


#### 最初的答案
您有不同的方法，无论操作是否真的是均值：

np.random.seed(0)
df = pd.DataFrame({'Value': np.random.randint(0, 10, 10)})

win = 3

df['mean'] = df['Value'].rolling(window=win, center=True, min_periods=1).mean()

df['mean_without_center'] = (df['Value'].rolling(window=win, center=True, min_periods=1)
.sum().sub(df['Value']).div(win-1)
)

half = win//2
mask = [False]*half+[True]+[False]*half

df['mean_without_center_apply'] = (df['Value'].rolling(window=win, center=True, min_periods=1)
.apply(lambda s: s.mask(mask[:len(s)]).mean())
)

输出：

Value mean mean_without_center mean_without_center_apply
0 5 2.500000 0.0 5.0
1 0 2.666667 4.0 4.0
2 3 2.000000 1.5 1.5
3 3 4.333333 5.0 5.0
4 7 6.333333 6.0 6.0
5 9 6.333333 5.0 5.0
6 3 5.666667 7.0 7.0
7 5 3.333333 2.5 2.5
8 2 3.666667 4.5 4.5
9 4 3.000000 1.0 2.0

或者使用[tag:numpy]的[`sliding_window_view`](https://numpy.org/devdocs/reference/generated/numpy.lib.stride_tricks.sliding_window_view.html)：

from numpy.lib.stride_tricks import sliding_window_view as swv

win = 3
half = win//2

df['swv_mean'] = np.nanmean(swv(np.pad(df['Value'].astype(float), (win-1, 0),
constant_values=np.nan),
win)
[:, np.r_[:half, half+1:win]],
axis=1
)

输出：

Value swv_mean
0 5 5.0
1 0 0.0
2 3 4.0
3 3 1.5
4 7 5.0
5 9 6.0
6 3 5.0
7 5 7.0
8 2 2.5
9 4 4.5

英文:

updated answer

You can modify my first approach of the initial answer to have a dynamic division:

win = 11
roll = df[&#39;Values&#39;].rolling(window=win, center=True, min_periods=1)
df[&#39;out&#39;] = roll.sum().sub(df[&#39;Values&#39;]).div(roll.count()-1)

Output:

    Values     Expected           out
0    13313  27180.60000  27180.600000
1    12792  28897.00000  28897.000000
2    20254  28770.14286  28770.142857
3    34915  27468.50000  27468.500000
4    31410  29037.00000  29037.000000
5    36532  30028.50000  30028.500000
6    36958  34071.10000  34071.100000
7    35471  36600.66667  36600.666667
8    33018  38950.62500  38950.625000
9    38080  38804.00000  38804.000000
10   44074  39037.33333  39037.333333
11   54165  37520.20000  37520.200000

initial answer

You have different ways, whether or not the operation is really a mean:

np.random.seed(0)
df = pd.DataFrame({&#39;Value&#39;: np.random.randint(0, 10, 10)})
win = 3
df[&#39;mean&#39;] = df[&#39;Value&#39;].rolling(window=win, center=True, min_periods=1).mean()
df[&#39;mean_without_center&#39;] = (df[&#39;Value&#39;].rolling(window=win, center=True, min_periods=1)
                             .sum().sub(df[&#39;Value&#39;]).div(win-1)
                            )
half = win//2
mask = [False]*half+[True]+[False]*half
df[&#39;mean_without_center_apply&#39;] = (df[&#39;Value&#39;].rolling(window=win, center=True, min_periods=1)
                             .apply(lambda s: s.mask(mask[:len(s)]).mean())
                            )

Output:

   Value      mean  mean_without_center  mean_without_center_apply
0      5  2.500000                  0.0                        5.0
1      0  2.666667                  4.0                        4.0
2      3  2.000000                  1.5                        1.5
3      3  4.333333                  5.0                        5.0
4      7  6.333333                  6.0                        6.0
5      9  6.333333                  5.0                        5.0
6      3  5.666667                  7.0                        7.0
7      5  3.333333                  2.5                        2.5
8      2  3.666667                  4.5                        4.5
9      4  3.000000                  1.0                        2.0

Or with [tag:numpy]'s sliding_window_view:

from numpy.lib.stride_tricks import sliding_window_view as swv
win = 3
half = win//2
df[&#39;swv_mean&#39;] = np.nanmean(swv(np.pad(df[&#39;Value&#39;].astype(float), (win-1, 0),
                                       constant_values=np.nan),
                                win)
                            [:, np.r_[:half, half+1:win]],
                            axis=1
                           )

Output:

   Value  swv_mean
0      5       5.0
1      0       0.0
2      3       4.0
3      3       1.5
4      7       5.0
5      9       6.0
6      3       5.0
7      5       7.0
8      2       2.5
9      4       4.5

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从Pandas中的居中滚动平均中排除中心值。

问题

答案1

updated answer

initial answer

Is there a way to execute something in the console of a website without opening the DevTools?

multiprocessing为每个线程创建基本变量的克隆。

获取Python变量的值和类型，类似Jupyter的行为。

Why if I'm placing a lookbehind constraint on the capturing group, does it ensure compliance but also capture what is prior to the given constraint?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论