在pandas滚动窗口中的比较操作

huangapple go评论65阅读模式
英文:

compare operation in a pandas rolling window

问题

我想创建一个滚动窗口,并将该窗口中的元素与最近的元素进行比较。实际上,我想要从所有其他元素中减去最后一个值。例如,如果我们有DataFrame:

df = pd.DataFrame([
    [2, 3, 5, 7,],
    [8, 3, 6, 1],
    [1, 5, 9, 13],
    [7, 3, 2, 7],
    [12, 4, 1, 0]
])

我想要创建长度为4的滚动窗口,因此在这种特定情况下,第一个窗口将是[2, 8, 1, 7]。现在,最后一个元素(为7)大于2和1,但小于8,因此操作的输出将是-1+1-1 = -1(如果大于-1,如果小于+1,则相等,这并不重要,但让我们给一个+1)。对于下一个滚动窗口也是类似的。现在,12大于窗口中的所有值,因此操作将返回-3。

最终的理想输出将是:

[NaN, NaN, NaN, NaN]
[NaN, NaN, NaN, NaN]
[NaN, NaN, NaN, NaN]
[-1, 3, 3, 1]
[-3, -1, 3, 3]
英文:

I want to make a rolling window and compare the elements in this window with the most recent one. In fact I want to subtrack the last value from all the the others. For example if we have the dataframe

df = pd.DataFrame([
    [2, 3, 5, 7,],
    [8, 3, 6, 1],
    [1, 5, 9, 13],
    [7, 3, 2, 7],
    [12, 4, 1, 0]
])

I would like to make a rolling window of length 4, hence in this particular case, the first window will be [2, 8, 1, 7]. Now the last element (which is 7) is greater than 2 and 1 but smaller than 8, hence the output of the operation will be -1+1-1 = -1 (-1 if greater, +1 if smaller. If equal, it doesnt really matter but lets we give a +1). Similarly for the next rolling window. Now, 12 is greater than all the values in the the window, therefore the operation will return -3.

The ideal output finally will be:

[NaN, NaN, NaN, NaN]
[NaN, NaN, NaN, NaN]
[NaN, NaN, NaN, NaN]
[-1,  3,    3,  1  ]
[ -3,  -1   3,  3  ]

I tried with pd.rolling().apply(), also with df.shift but couldnt get anywhere

答案1

得分: 2

以下是翻译好的部分:

可以使用自定义 lambda 函数进行 rolling.apply,其中 g.iloc[:-1] - g.iat[-1] >= 0 与窗口中的最后一个元素进行比较:

df.rolling(window=4).apply(lambda g: np.where(g.iloc[:-1] - g.iat[-1] >= 0, 1, -1).sum())
     0    1    2    3
0  NaN  NaN  NaN  NaN
1  NaN  NaN  NaN  NaN
2  NaN  NaN  NaN  NaN
3 -1.0  3.0  3.0  1.0
4 -3.0 -1.0  3.0  3.0
英文:

Could be rolling.apply with a custom lambda, where g.iloc[:-1] - g.iat[-1] >= 0 compares all previous elements with the last element in the window:

df.rolling(window=4).apply(lambda g: np.where(g.iloc[:-1] - g.iat[-1] >= 0, 1, -1).sum())

     0    1    2    3
0  NaN  NaN  NaN  NaN
1  NaN  NaN  NaN  NaN
2  NaN  NaN  NaN  NaN
3 -1.0  3.0  3.0  1.0
4 -3.0 -1.0  3.0  3.0

答案2

得分: 2

你可以使用[tag:numpy]的sliding_window_view

from numpy.lib.stride_tricks import sliding_window_view as swv

N = 4

a = df.to_numpy()

out = pd.DataFrame(index=df.index, columns=df.columns)

out.iloc[N-1:,:] = \
np.where(swv(a, (N-1,1))[:-1] >= a[N-1:][..., None, None],
         1, -1).sum(axis=(-1,-2))

print(out)

输出:

     0    1    2    3
0  NaN  NaN  NaN  NaN
1  NaN  NaN  NaN  NaN
2  NaN  NaN  NaN  NaN
3   -1    3    3    1
4   -3   -1    3    3
英文:

You could use [tag:numpy]'s sliding_window_view

from numpy.lib.stride_tricks import sliding_window_view as swv

N = 4

a = df.to_numpy()

out = pd.DataFrame(index=df.index, columns=df.columns)

out.iloc[N-1:,:] = \
np.where(swv(a, (N-1,1))[:-1] >= a[N-1:][..., None, None],
         1, -1).sum(axis=(-1,-2))

print(out)

Output:

     0    1    2    3
0  NaN  NaN  NaN  NaN
1  NaN  NaN  NaN  NaN
2  NaN  NaN  NaN  NaN
3   -1    3    3    1
4   -3   -1    3    3

huangapple
  • 本文由 发表于 2023年5月21日 09:11:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76297916.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定