英文:
Calculation of average of all possible slices of 1d-array
问题
我有一个测量数据的一维数组。我想要计算由起始索引和结束索引定义的每个可能切片的平均值。就像在我的数据的两端切割并计算每个可能的切割的平均值。
结果应该存储在一个二维数组中(实际上是一个三角形,因为起始索引必须小于结束索引)。
使用循环可以工作,但需要很长时间。
这是我的代码:
N = 5
data = np.arange(N) # 例如
av = np.zeros((N, N))
for i in range(av.shape[0]):
for j in range(av.shape[1]):
av[j, i] = np.mean(data[i:j+1])
这个代码可以工作,但需要很长时间。对于类似的计算(元素之间的差异而不是切片的平均值),我找到了这个非常快的解决方案:
dist = np.subtract.outer(data, data)
但我没有找出如何使用切片的平均值来做到这一点。
英文:
I have a (big) 1d array of measurement data. I want to calculate the average for every possible slice defined by a start-index and a stop-index. Like cutting at both ends of my data and average every possible cut.
The result should be stored in a square 2D array (actually a triangle, as the start index must be smaller than the stop index).
Using loops works, but takes a long time.
Is there a way to speed this up?
I have this code:
N = 5
data = np.arange(N) # example
av = np.zeros((N, N))
for i in range(av.shape[0]):
for j in range(av.shape[1]):
av[j, i] = np.mean(data[i:j+1])
This works, but takes a long time. For a similar calculation (differences of elements instead of averages of slices), I found this very fast solution:
dist = np.subtract.outer(data, data)
But I did not figure out how this could be done with averages of slices.
答案1
得分: 0
代码部分不翻译,只提供已翻译的文本部分。以下是您提供的内容的翻译:
# 通过求和后除以项数的一种选项:
a = np.arange(len(data))
av = (np.tril(np.repeat(data[:, None], len(data), axis=1)).cumsum(axis=0)
/ np.tril((a[:, None] - a + 1))
)
输出:
array([[0. , nan, nan, nan, nan],
[0.5, 1. , nan, nan, nan],
[1. , 1.5, 2. , nan, nan],
[1.5, 2. , 2.5, 3. , nan],
[2. , 2.5, 3. , 3.5, 4. ]])
中间结果:
# 重复数据以获得方形形状并保留下三角形
np.tril(np.repeat(data[:, None], len(data), axis=1))
array([[0, 0, 0, 0, 0],
[1, 1, 0, 0, 0],
[2, 2, 2, 0, 0],
[3, 3, 3, 3, 0],
[4, 4, 4, 4, 4]])
# 获取累积和
[…].cumsum()
array([[ 0, 0, 0, 0, 0],
[ 1, 1, 0, 0, 0],
[ 3, 3, 2, 0, 0],
[ 6, 6, 5, 3, 0],
[10, 10, 9, 7, 4]])
# 计算除数
a = np.arange(len(data))
array([0, 1, 2, 3, 4])
np.tril((a[:, None] - a + 1))
array([[1, 0, 0, 0, 0],
[2, 1, 0, 0, 0],
[3, 2, 1, 0, 0],
[4, 3, 2, 1, 0],
[5, 4, 3, 2, 1]])
性能
对于输入数据为1000的整数数组 (data = np.arange(1000)
):
# 嵌套的for循环
6.28 秒 ± 142 毫秒每次循环(平均值 ± 7 次运行的标准差,每次循环1次)
# 矢量化的numpy
12.3 毫秒 ± 427 微秒每次循环(平均值 ± 7 次运行的标准差,每次循环100次)
英文:
One option with a summation, then division by the number of items:
a = np.arange(len(data))
av = (np.tril(np.repeat(data[:,None], len(data), axis=1)).cumsum(axis=0)
/np.tril((a[:,None]-a+1))
)
Output:
array([[0. , nan, nan, nan, nan],
[0.5, 1. , nan, nan, nan],
[1. , 1.5, 2. , nan, nan],
[1.5, 2. , 2.5, 3. , nan],
[2. , 2.5, 3. , 3.5, 4. ]])
Intermediates:
# repeat data to square shape and keep lower triangle
np.tril(np.repeat(data[:,None], len(data), axis=1))
array([[0, 0, 0, 0, 0],
[1, 1, 0, 0, 0],
[2, 2, 2, 0, 0],
[3, 3, 3, 3, 0],
[4, 4, 4, 4, 4]])
# get the cumulated sum
[…].cumsum()
array([[ 0, 0, 0, 0, 0],
[ 1, 1, 0, 0, 0],
[ 3, 3, 2, 0, 0],
[ 6, 6, 5, 3, 0],
[10, 10, 9, 7, 4]])
# compute the divider
a = np.arange(len(data))
array([0, 1, 2, 3, 4])
np.tril((a[:,None]-a+1))
array([[1, 0, 0, 0, 0],
[2, 1, 0, 0, 0],
[3, 2, 1, 0, 0],
[4, 3, 2, 1, 0],
[5, 4, 3, 2, 1]])
performance
One a 1000 integer input (data = np.arange(1000)
)
# nested for loop
6.28 s ± 142 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# vectorized numpy
12.3 ms ± 427 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论