2023年3月3日 18:24:14go评论63阅读模式

英文:

Calculation of average of all possible slices of 1d-array

问题

我有一个测量数据的一维数组。我想要计算由起始索引和结束索引定义的每个可能切片的平均值。就像在我的数据的两端切割并计算每个可能的切割的平均值。

结果应该存储在一个二维数组中（实际上是一个三角形，因为起始索引必须小于结束索引）。

使用循环可以工作，但需要很长时间。

这是我的代码：

N = 5
data = np.arange(N)  # 例如
av = np.zeros((N, N))
for i in range(av.shape[0]):
    for j in range(av.shape[1]):
        av[j, i] = np.mean(data[i:j+1])

这个代码可以工作，但需要很长时间。对于类似的计算（元素之间的差异而不是切片的平均值），我找到了这个非常快的解决方案：

dist = np.subtract.outer(data, data)

但我没有找出如何使用切片的平均值来做到这一点。

英文:

I have a (big) 1d array of measurement data. I want to calculate the average for every possible slice defined by a start-index and a stop-index. Like cutting at both ends of my data and average every possible cut.
The result should be stored in a square 2D array (actually a triangle, as the start index must be smaller than the stop index).

Using loops works, but takes a long time.

Is there a way to speed this up?

I have this code:

N = 5
data = np.arange(N)  # example
av = np.zeros((N, N))
for i in range(av.shape[0]):
    for j in range(av.shape[1]):
        av[j, i] = np.mean(data[i:j+1])

This works, but takes a long time. For a similar calculation (differences of elements instead of averages of slices), I found this very fast solution:

dist = np.subtract.outer(data, data)

But I did not figure out how this could be done with averages of slices.

答案1

得分: 0

代码部分不翻译，只提供已翻译的文本部分。以下是您提供的内容的翻译：

# 通过求和后除以项数的一种选项：
a = np.arange(len(data))

av = (np.tril(np.repeat(data[:, None], len(data), axis=1)).cumsum(axis=0)
     / np.tril((a[:, None] - a + 1))
     )

输出：

array([[0. , nan, nan, nan, nan],
       [0.5, 1. , nan, nan, nan],
       [1. , 1.5, 2. , nan, nan],
       [1.5, 2. , 2.5, 3. , nan],
       [2. , 2.5, 3. , 3.5, 4. ]])

中间结果：

# 重复数据以获得方形形状并保留下三角形
np.tril(np.repeat(data[:, None], len(data), axis=1))
array([[0, 0, 0, 0, 0],
       [1, 1, 0, 0, 0],
       [2, 2, 2, 0, 0],
       [3, 3, 3, 3, 0],
       [4, 4, 4, 4, 4]])

# 获取累积和
[…].cumsum()
array([[ 0,  0,  0,  0,  0],
       [ 1,  1,  0,  0,  0],
       [ 3,  3,  2,  0,  0],
       [ 6,  6,  5,  3,  0],
       [10, 10,  9,  7,  4]])

# 计算除数
a = np.arange(len(data))
array([0, 1, 2, 3, 4])

np.tril((a[:, None] - a + 1))
array([[1, 0, 0, 0, 0],
       [2, 1, 0, 0, 0],
       [3, 2, 1, 0, 0],
       [4, 3, 2, 1, 0],
       [5, 4, 3, 2, 1]])

性能

对于输入数据为1000的整数数组 (data = np.arange(1000))：

# 嵌套的for循环
6.28 秒 &#177; 142 毫秒每次循环（平均值 &#177; 7 次运行的标准差，每次循环1次）

# 矢量化的numpy
12.3 毫秒 &#177; 427 微秒每次循环（平均值 &#177; 7 次运行的标准差，每次循环100次）

英文:

One option with a summation, then division by the number of items:

a = np.arange(len(data))

av = (np.tril(np.repeat(data[:,None], len(data), axis=1)).cumsum(axis=0)
     /np.tril((a[:,None]-a+1))
     )

Output:

array([[0. , nan, nan, nan, nan],
       [0.5, 1. , nan, nan, nan],
       [1. , 1.5, 2. , nan, nan],
       [1.5, 2. , 2.5, 3. , nan],
       [2. , 2.5, 3. , 3.5, 4. ]])

Intermediates:

# repeat data to square shape and keep lower triangle
np.tril(np.repeat(data[:,None], len(data), axis=1))
array([[0, 0, 0, 0, 0],
       [1, 1, 0, 0, 0],
       [2, 2, 2, 0, 0],
       [3, 3, 3, 3, 0],
       [4, 4, 4, 4, 4]])

# get the cumulated sum
[…].cumsum()
array([[ 0,  0,  0,  0,  0],
       [ 1,  1,  0,  0,  0],
       [ 3,  3,  2,  0,  0],
       [ 6,  6,  5,  3,  0],
       [10, 10,  9,  7,  4]])

# compute the divider
a = np.arange(len(data))
array([0, 1, 2, 3, 4])

np.tril((a[:,None]-a+1))
array([[1, 0, 0, 0, 0],
       [2, 1, 0, 0, 0],
       [3, 2, 1, 0, 0],
       [4, 3, 2, 1, 0],
       [5, 4, 3, 2, 1]])

performance

One a 1000 integer input (data = np.arange(1000))

# nested for loop
6.28 s &#177; 142 ms per loop (mean &#177; std. dev. of 7 runs, 1 loop each)

# vectorized numpy
12.3 ms &#177; 427 &#181;s per loop (mean &#177; std. dev. of 7 runs, 100 loops each)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

1D数组所有可能切片的平均值计算

问题

答案1

性能

performance

在函数内部原地修改切片的内容和容量。

Go语言中与Java的System.arraycopy()等效的函数是什么？

从切片中删除元素时出现意外结果

字节流通道使用

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论