英文:
compare operation in a pandas rolling window
问题
我想创建一个滚动窗口,并将该窗口中的元素与最近的元素进行比较。实际上,我想要从所有其他元素中减去最后一个值。例如,如果我们有DataFrame:
df = pd.DataFrame([
[2, 3, 5, 7,],
[8, 3, 6, 1],
[1, 5, 9, 13],
[7, 3, 2, 7],
[12, 4, 1, 0]
])
我想要创建长度为4的滚动窗口,因此在这种特定情况下,第一个窗口将是[2, 8, 1, 7]。现在,最后一个元素(为7)大于2和1,但小于8,因此操作的输出将是-1+1-1 = -1(如果大于-1,如果小于+1,则相等,这并不重要,但让我们给一个+1)。对于下一个滚动窗口也是类似的。现在,12大于窗口中的所有值,因此操作将返回-3。
最终的理想输出将是:
[NaN, NaN, NaN, NaN]
[NaN, NaN, NaN, NaN]
[NaN, NaN, NaN, NaN]
[-1, 3, 3, 1]
[-3, -1, 3, 3]
英文:
I want to make a rolling window and compare the elements in this window with the most recent one. In fact I want to subtrack the last value from all the the others. For example if we have the dataframe
df = pd.DataFrame([
[2, 3, 5, 7,],
[8, 3, 6, 1],
[1, 5, 9, 13],
[7, 3, 2, 7],
[12, 4, 1, 0]
])
I would like to make a rolling window of length 4, hence in this particular case, the first window will be [2, 8, 1, 7]. Now the last element (which is 7) is greater than 2 and 1 but smaller than 8, hence the output of the operation will be -1+1-1 = -1 (-1 if greater, +1 if smaller. If equal, it doesnt really matter but lets we give a +1). Similarly for the next rolling window. Now, 12 is greater than all the values in the the window, therefore the operation will return -3.
The ideal output finally will be:
[NaN, NaN, NaN, NaN]
[NaN, NaN, NaN, NaN]
[NaN, NaN, NaN, NaN]
[-1, 3, 3, 1 ]
[ -3, -1 3, 3 ]
I tried with pd.rolling().apply()
, also with df.shift
but couldnt get anywhere
答案1
得分: 2
以下是翻译好的部分:
可以使用自定义 lambda 函数进行 rolling.apply
,其中 g.iloc[:-1] - g.iat[-1] >= 0
与窗口中的最后一个元素进行比较:
df.rolling(window=4).apply(lambda g: np.where(g.iloc[:-1] - g.iat[-1] >= 0, 1, -1).sum())
0 1 2 3
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 -1.0 3.0 3.0 1.0
4 -3.0 -1.0 3.0 3.0
英文:
Could be rolling.apply
with a custom lambda, where g.iloc[:-1] - g.iat[-1] >= 0
compares all previous elements with the last element in the window:
df.rolling(window=4).apply(lambda g: np.where(g.iloc[:-1] - g.iat[-1] >= 0, 1, -1).sum())
0 1 2 3
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 -1.0 3.0 3.0 1.0
4 -3.0 -1.0 3.0 3.0
答案2
得分: 2
你可以使用[tag:numpy]的sliding_window_view
。
from numpy.lib.stride_tricks import sliding_window_view as swv
N = 4
a = df.to_numpy()
out = pd.DataFrame(index=df.index, columns=df.columns)
out.iloc[N-1:,:] = \
np.where(swv(a, (N-1,1))[:-1] >= a[N-1:][..., None, None],
1, -1).sum(axis=(-1,-2))
print(out)
输出:
0 1 2 3
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 -1 3 3 1
4 -3 -1 3 3
英文:
You could use [tag:numpy]'s sliding_window_view
from numpy.lib.stride_tricks import sliding_window_view as swv
N = 4
a = df.to_numpy()
out = pd.DataFrame(index=df.index, columns=df.columns)
out.iloc[N-1:,:] = \
np.where(swv(a, (N-1,1))[:-1] >= a[N-1:][..., None, None],
1, -1).sum(axis=(-1,-2))
print(out)
Output:
0 1 2 3
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 -1 3 3 1
4 -3 -1 3 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论