英文:
Exclude center value from centered rolling average in Pandas
问题
我想在Pandas中计算居中滚动平均值,其中排除了中心值。
以下代码完美运行,但包括中心值:
df.Value.rolling(window=11, center=True, min_periods=1).mean()
window=11
和 min_periods=1
很重要,因为我想为df中的每个值计算平均值。
例如,
- 对于第二个值,它应该考虑当前值以下的5个值和当前值上面的1个值。(当前代码包括第二个值)
- 对于最后一个值,它应该考虑上面的5个值,但排除最后一个值。(当前代码考虑了最后6个值)
以下是预期结果的表格:
值 | 预期值 |
---|---|
13313 | 27180.6 |
12792 | 28897 |
20254 | 28770.14286 |
34915 | 27468.5 |
31410 | 29037 |
36532 | 30028.5 |
36958 | 34071.1 |
35471 | 36600.66667 |
33018 | 38950.625 |
38080 | 38804 |
44074 | 39037.33333 |
54165 | 37520.2 |
并且Excel中的计算方法如下:
英文:
I want to calculate the centered rolling average in Pandas, where the center value is excluded.
The code below works perfectly, but it includes the center value:
df.Value.rolling(window=11, center=True, min_periods=1).mean()
The window=11
and min_periods=1
are important, since I want to calculate the average for every value in df.
For example,
- for the second value, it should consider 5 values below and 1 that is above the current value. (Current code includes the second value)
- for the last value, it should consider 5 values above, but exclude the last one. (Current code considers the last 6 values)
Here is a table with the expected results:
Values | Expected |
---|---|
13313 | 27180.6 |
12792 | 28897 |
20254 | 28770.14286 |
34915 | 27468.5 |
31410 | 29037 |
36532 | 30028.5 |
36958 | 34071.1 |
35471 | 36600.66667 |
33018 | 38950.625 |
38080 | 38804 |
44074 | 39037.33333 |
54165 | 37520.2 |
And calculation method from Excel:
答案1
得分: 1
#### 更新的答案
您可以修改我最初回答中的方法,使其具有动态划分:
win = 11
roll = df['Values'].rolling(window=win, center=True, min_periods=1)
df['out'] = roll.sum().sub(df['Values']).div(roll.count()-1)
输出:
Values Expected out
0 13313 27180.60000 27180.600000
1 12792 28897.00000 28897.000000
2 20254 28770.14286 28770.142857
3 34915 27468.50000 27468.500000
4 31410 29037.00000 29037.000000
5 36532 30028.50000 30028.500000
6 36958 34071.10000 34071.100000
7 35471 36600.66667 36600.666667
8 33018 38950.62500 38950.625000
9 38080 38804.00000 38804.000000
10 44074 39037.33333 39037.333333
11 54165 37520.20000 37520.200000
#### 最初的答案
您有不同的方法,无论操作是否真的是均值:
np.random.seed(0)
df = pd.DataFrame({'Value': np.random.randint(0, 10, 10)})
win = 3
df['mean'] = df['Value'].rolling(window=win, center=True, min_periods=1).mean()
df['mean_without_center'] = (df['Value'].rolling(window=win, center=True, min_periods=1)
.sum().sub(df['Value']).div(win-1)
)
half = win//2
mask = [False]*half+[True]+[False]*half
df['mean_without_center_apply'] = (df['Value'].rolling(window=win, center=True, min_periods=1)
.apply(lambda s: s.mask(mask[:len(s)]).mean())
)
输出:
Value mean mean_without_center mean_without_center_apply
0 5 2.500000 0.0 5.0
1 0 2.666667 4.0 4.0
2 3 2.000000 1.5 1.5
3 3 4.333333 5.0 5.0
4 7 6.333333 6.0 6.0
5 9 6.333333 5.0 5.0
6 3 5.666667 7.0 7.0
7 5 3.333333 2.5 2.5
8 2 3.666667 4.5 4.5
9 4 3.000000 1.0 2.0
或者使用[tag:numpy]的[`sliding_window_view`](https://numpy.org/devdocs/reference/generated/numpy.lib.stride_tricks.sliding_window_view.html):
from numpy.lib.stride_tricks import sliding_window_view as swv
win = 3
half = win//2
df['swv_mean'] = np.nanmean(swv(np.pad(df['Value'].astype(float), (win-1, 0),
constant_values=np.nan),
win)
[:, np.r_[:half, half+1:win]],
axis=1
)
输出:
Value swv_mean
0 5 5.0
1 0 0.0
2 3 4.0
3 3 1.5
4 7 5.0
5 9 6.0
6 3 5.0
7 5 7.0
8 2 2.5
9 4 4.5
英文:
updated answer
You can modify my first approach of the initial answer to have a dynamic division:
win = 11
roll = df['Values'].rolling(window=win, center=True, min_periods=1)
df['out'] = roll.sum().sub(df['Values']).div(roll.count()-1)
Output:
Values Expected out
0 13313 27180.60000 27180.600000
1 12792 28897.00000 28897.000000
2 20254 28770.14286 28770.142857
3 34915 27468.50000 27468.500000
4 31410 29037.00000 29037.000000
5 36532 30028.50000 30028.500000
6 36958 34071.10000 34071.100000
7 35471 36600.66667 36600.666667
8 33018 38950.62500 38950.625000
9 38080 38804.00000 38804.000000
10 44074 39037.33333 39037.333333
11 54165 37520.20000 37520.200000
initial answer
You have different ways, whether or not the operation is really a mean:
np.random.seed(0)
df = pd.DataFrame({'Value': np.random.randint(0, 10, 10)})
win = 3
df['mean'] = df['Value'].rolling(window=win, center=True, min_periods=1).mean()
df['mean_without_center'] = (df['Value'].rolling(window=win, center=True, min_periods=1)
.sum().sub(df['Value']).div(win-1)
)
half = win//2
mask = [False]*half+[True]+[False]*half
df['mean_without_center_apply'] = (df['Value'].rolling(window=win, center=True, min_periods=1)
.apply(lambda s: s.mask(mask[:len(s)]).mean())
)
Output:
Value mean mean_without_center mean_without_center_apply
0 5 2.500000 0.0 5.0
1 0 2.666667 4.0 4.0
2 3 2.000000 1.5 1.5
3 3 4.333333 5.0 5.0
4 7 6.333333 6.0 6.0
5 9 6.333333 5.0 5.0
6 3 5.666667 7.0 7.0
7 5 3.333333 2.5 2.5
8 2 3.666667 4.5 4.5
9 4 3.000000 1.0 2.0
Or with [tag:numpy]'s sliding_window_view
:
from numpy.lib.stride_tricks import sliding_window_view as swv
win = 3
half = win//2
df['swv_mean'] = np.nanmean(swv(np.pad(df['Value'].astype(float), (win-1, 0),
constant_values=np.nan),
win)
[:, np.r_[:half, half+1:win]],
axis=1
)
Output:
Value swv_mean
0 5 5.0
1 0 0.0
2 3 4.0
3 3 1.5
4 7 5.0
5 9 6.0
6 3 5.0
7 5 7.0
8 2 2.5
9 4 4.5
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论