从Pandas中的居中滚动平均中排除中心值。

huangapple go评论66阅读模式
英文:

Exclude center value from centered rolling average in Pandas

问题

我想在Pandas中计算居中滚动平均值,其中排除了中心值。

以下代码完美运行,但包括中心值:

df.Value.rolling(window=11, center=True, min_periods=1).mean()

window=11min_periods=1 很重要,因为我想为df中的每个值计算平均值。

例如,

  • 对于第二个值,它应该考虑当前值以下的5个值和当前值上面的1个值。(当前代码包括第二个值)
  • 对于最后一个值,它应该考虑上面的5个值,但排除最后一个值。(当前代码考虑了最后6个值)

以下是预期结果的表格:

预期值
13313 27180.6
12792 28897
20254 28770.14286
34915 27468.5
31410 29037
36532 30028.5
36958 34071.1
35471 36600.66667
33018 38950.625
38080 38804
44074 39037.33333
54165 37520.2

并且Excel中的计算方法如下:

Example Table

英文:

I want to calculate the centered rolling average in Pandas, where the center value is excluded.

The code below works perfectly, but it includes the center value:

df.Value.rolling(window=11, center=True, min_periods=1).mean()

The window=11 and min_periods=1 are important, since I want to calculate the average for every value in df.

For example,

  • for the second value, it should consider 5 values below and 1 that is above the current value. (Current code includes the second value)
  • for the last value, it should consider 5 values above, but exclude the last one. (Current code considers the last 6 values)

Here is a table with the expected results:

Values Expected
13313 27180.6
12792 28897
20254 28770.14286
34915 27468.5
31410 29037
36532 30028.5
36958 34071.1
35471 36600.66667
33018 38950.625
38080 38804
44074 39037.33333
54165 37520.2

And calculation method from Excel:

Example Table

答案1

得分: 1

#### 更新的答案
您可以修改我最初回答中的方法,使其具有动态划分:

win = 11
roll = df['Values'].rolling(window=win, center=True, min_periods=1)

df['out'] = roll.sum().sub(df['Values']).div(roll.count()-1)

输出:
Values     Expected           out

0 13313 27180.60000 27180.600000
1 12792 28897.00000 28897.000000
2 20254 28770.14286 28770.142857
3 34915 27468.50000 27468.500000
4 31410 29037.00000 29037.000000
5 36532 30028.50000 30028.500000
6 36958 34071.10000 34071.100000
7 35471 36600.66667 36600.666667
8 33018 38950.62500 38950.625000
9 38080 38804.00000 38804.000000
10 44074 39037.33333 39037.333333
11 54165 37520.20000 37520.200000



#### 最初的答案
您有不同的方法,无论操作是否真的是均值:

np.random.seed(0)
df = pd.DataFrame({'Value': np.random.randint(0, 10, 10)})

win = 3

df['mean'] = df['Value'].rolling(window=win, center=True, min_periods=1).mean()

df['mean_without_center'] = (df['Value'].rolling(window=win, center=True, min_periods=1)
.sum().sub(df['Value']).div(win-1)
)

half = win//2
mask = [False]*half+[True]+[False]*half

df['mean_without_center_apply'] = (df['Value'].rolling(window=win, center=True, min_periods=1)
.apply(lambda s: s.mask(mask[:len(s)]).mean())
)

输出:

Value mean mean_without_center mean_without_center_apply
0 5 2.500000 0.0 5.0
1 0 2.666667 4.0 4.0
2 3 2.000000 1.5 1.5
3 3 4.333333 5.0 5.0
4 7 6.333333 6.0 6.0
5 9 6.333333 5.0 5.0
6 3 5.666667 7.0 7.0
7 5 3.333333 2.5 2.5
8 2 3.666667 4.5 4.5
9 4 3.000000 1.0 2.0

或者使用[tag:numpy]的[`sliding_window_view`](https://numpy.org/devdocs/reference/generated/numpy.lib.stride_tricks.sliding_window_view.html):

from numpy.lib.stride_tricks import sliding_window_view as swv

win = 3
half = win//2

df['swv_mean'] = np.nanmean(swv(np.pad(df['Value'].astype(float), (win-1, 0),
constant_values=np.nan),
win)
[:, np.r_[:half, half+1:win]],
axis=1
)

输出:

Value swv_mean
0 5 5.0
1 0 0.0
2 3 4.0
3 3 1.5
4 7 5.0
5 9 6.0
6 3 5.0
7 5 7.0
8 2 2.5
9 4 4.5

英文:

updated answer

You can modify my first approach of the initial answer to have a dynamic division:

win = 11
roll = df['Values'].rolling(window=win, center=True, min_periods=1)

df['out'] = roll.sum().sub(df['Values']).div(roll.count()-1)

Output:

    Values     Expected           out
0    13313  27180.60000  27180.600000
1    12792  28897.00000  28897.000000
2    20254  28770.14286  28770.142857
3    34915  27468.50000  27468.500000
4    31410  29037.00000  29037.000000
5    36532  30028.50000  30028.500000
6    36958  34071.10000  34071.100000
7    35471  36600.66667  36600.666667
8    33018  38950.62500  38950.625000
9    38080  38804.00000  38804.000000
10   44074  39037.33333  39037.333333
11   54165  37520.20000  37520.200000

initial answer

You have different ways, whether or not the operation is really a mean:

np.random.seed(0)
df = pd.DataFrame({'Value': np.random.randint(0, 10, 10)})

win = 3

df['mean'] = df['Value'].rolling(window=win, center=True, min_periods=1).mean()

df['mean_without_center'] = (df['Value'].rolling(window=win, center=True, min_periods=1)
                             .sum().sub(df['Value']).div(win-1)
                            )

half = win//2
mask = [False]*half+[True]+[False]*half

df['mean_without_center_apply'] = (df['Value'].rolling(window=win, center=True, min_periods=1)
                             .apply(lambda s: s.mask(mask[:len(s)]).mean())
                            )

Output:

   Value      mean  mean_without_center  mean_without_center_apply
0      5  2.500000                  0.0                        5.0
1      0  2.666667                  4.0                        4.0
2      3  2.000000                  1.5                        1.5
3      3  4.333333                  5.0                        5.0
4      7  6.333333                  6.0                        6.0
5      9  6.333333                  5.0                        5.0
6      3  5.666667                  7.0                        7.0
7      5  3.333333                  2.5                        2.5
8      2  3.666667                  4.5                        4.5
9      4  3.000000                  1.0                        2.0

Or with [tag:numpy]'s sliding_window_view:

from numpy.lib.stride_tricks import sliding_window_view as swv

win = 3
half = win//2

df['swv_mean'] = np.nanmean(swv(np.pad(df['Value'].astype(float), (win-1, 0),
                                       constant_values=np.nan),
                                win)
                            [:, np.r_[:half, half+1:win]],
                            axis=1
                           )

Output:

   Value  swv_mean
0      5       5.0
1      0       0.0
2      3       4.0
3      3       1.5
4      7       5.0
5      9       6.0
6      3       5.0
7      5       7.0
8      2       2.5
9      4       4.5

huangapple
  • 本文由 发表于 2023年5月13日 18:09:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76242175.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定