我有一个重复的numpy和pandas数组,如何找出它在哪里以及何时重复?

huangapple go评论64阅读模式
英文:

I have a numpy and or pandas array that repeats, how do i find out where and when it does?

问题

sf重复每12个切片,sx重复每24个切片,sz重复每35个切片。要找出这些切片的重复位置,您可以使用以下策略:

  1. 创建一个新的DataFrame列,将每个列的值与前一个切片的值进行比较,如果相同则标记为True,否则为False。

  2. 使用布尔索引来找到每个列中True值的位置,这将是每个切片的重复位置。

  3. 对于ss列,您可以使用ss = sf ^ sx ^ sz的方式计算,然后按照相同的方法找到它的重复位置。

这些策略可以帮助您自动找到切片重复的位置,而不需要手动检查它们。希望这可以帮助您解决问题。

英文:

Ok, this is pandas but i don't care if there is a pandas or numpy solution, i'm just looking for a solution to see where the pattern repeats: Here is what is have:

Out[713]: 
     sf  sx  sz  ss
0    12  15   5   6
1    15   1  13   3
2    13  10   6   1
3     9  14   8  15
4     2   2   6   6
5     8   8   2   2
6    15   8   2   5
7     4   6   9  11
8    14  13  10   9
9     2  12   5  11
10    1   6  15   8
11    3   4   9  14
12   12  12  14  14
13   15  15   5   5
14   13  10  10  13
15    9  11  13  15
16    2   1  10   9
17    8   6   3  13
18   15   8  14   9
19    4   3  13  10
20   14  14   2   2
21    2   2   5   5
22    1   6   1   6
23    3   1  13  15
24   12  15   0   3
25   15   1   9   7
26   13  10   2   5
27    9  14  14   9
28    2   2   2   2
29    8   8   2   2
30   15   8  10  13
31    4   6  15  13
32   14  13   5   6
33    2  12   5  11
34    1   6  13  10
35    3   4   5   2
36   12  12  13  13
37   15  15   6   6
38   13  10   8  15
39    9  11   6   4
40    2   1   2   1
41    8   6   2  12
42   15   8   9  14
43    4   3  10  13
44   14  14   5   5
45    2   2  15  15
46    1   6   9  14
47    3   1  14  12
48   12  15   5   6
49   15   1  10   4
50   13  10  13  10


use pd.read_clipboard()
if you want to copy paste

you can see that sf repeats every slice of 12, and sx repeats every slice of 24, and sz repeats at every slice of 35. How do i figure out where these slices repeats without manually checking them, and also slice ss repeats, but i can't seem to figure out how. What strategies can i use to figure out where ss repeats.

Thanks in advance, i couldn't find an answer so wanted to as anyone with knowledge of this situation.

ss is actually just: ss = sf ^ sx ^ sz
if that helps

Thanks

答案1

得分: 3

你可以使用autocorr和自定义函数来计算自相关性:

import numpy as np

def get_lag(s):
    return next((lag for lag in range(1, len(s))
                if np.isclose(s.autocorr(lag=lag), 1)), 0)

df.apply(get_lag)

注意:我使用了接近1的相关性值,并在第一个匹配时停止,如果你对近似匹配满意,也可以调整逻辑以使用阈值。

输出:

sf    12
sx    24
sz    35
ss     0
dtype: int64
英文:

You can compute an autocorrelation with autocorr and a custom function:

import numpy as np

def get_lag(s):
    return next((lag for lag in range(1, len(s))
                if np.isclose(s.autocorr(lag=lag), 1)), 0)

df.apply(get_lag)

NB. I used a correlation of almost 1 as value and stop on the first match, you can also adapt the logic to use a threshold if you're fine with an approximate match.

Output:

sf    12
sx    24
sz    35
ss     0
dtype: int64

huangapple
  • 本文由 发表于 2023年4月19日 15:28:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76051799.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定