英文:
Numpy speed for computing metrics (like np.mean) over multiple axes vs. single axis
问题
目前,我正在处理视频数据,因此正在对多个帧同时执行统计操作。在调试会话中,我观察到在这种情况下,对于多个轴的numpy统计(例如均值计算),与逐个分别计算每个轴相比,直接在所需轴上计算需要更长的时间。我创建了一个简单的示例来解释我的观察结果。
from timeit import default_timer as timer
import numpy as np
rnd_frames = np.random.randn(100, 128, 128, 3)
n_reps = 1000
# -----------------------------------
# 在多个轴上进行均值计算
# -----------------------------------
# 一次计算所有轴
ts = timer()
for i in range(n_reps):
mean_1 = np.mean(rnd_frames, axis=(1, 2))
print('一次计算所有轴的均值: ', (timer()-ts)/n_reps)
# 逐个计算每个轴
ts = timer()
for i in range(n_reps):
mean_2 = np.mean(rnd_frames, axis=1)
mean_2 = np.mean(mean_2, axis=1)
print('逐个计算每个轴的均值: ', (timer()-ts)/n_reps)
print('均值差异: ', np.sum(np.abs(mean_1-mean_2)))
差异非常小,这是由于float64精度导致的。
是否有人能够解释这个现象?由于时间差异相当显著:逐个计算每个轴的速度快了10倍
我想知道这是否是某种bug?有人能解释一下吗?
英文:
Currently, I am working with video data and therefore I am performing statistical operations on multiple frames at once. During a debugging session I observed that the computation for numpy statistics (mean computation) in this case) over multiple axes takes longer when computed directly over the desired axes compared to computing it over each axis separately one after the other. I created a simple example to explain my observations.
from timeit import default_timer as timer
import numpy as np
rnd_frames = np.random.randn(100, 128, 128, 3)
n_reps = 1000
# -----------------------------------
# mean computation over multiple axes
# -----------------------------------
# all axes at once
ts = timer()
for i in range(n_reps):
mean_1 = np.mean(rnd_frames, axis=(1, 2))
print('Mean all at once: ', (timer()-ts)/n_reps)
# one after the other
ts = timer()
for i in range(n_reps):
mean_2 = np.mean(rnd_frames, axis=1)
mean_2 = np.mean(mean_2, axis=1)
print('Mean one after the other: ', (timer()-ts)/n_reps)
print('Difference in means: ', np.sum(np.abs(mean_1-mean_2)))
The difference is very small and results from float64 precision.
Does someone have an explanation for this? As the time differences are quite significant: One after the other is 10x faster
I wonder if this is some kind of bug? Can anyone explain this.
答案1
得分: 2
2轴的计算时间与等效重塑中的1轴相同:
In [7]: timeit mean_1 = np.mean(rnd_frames, axis=(1, 2))
54.2 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [11]: timeit mean_3 = np.mean(rnd_frames.reshape(100,-1,3), axis=1)
54.5 ms ± 142 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [12]: rnd_frames.reshape(100,-1,3).shape
Out[12]: (100, 16384, 3)
正如您所指出的,这比顺序计算要大得多:
In [13]: %%timeit
...: mean_2 = np.mean(rnd_frames, axis=1)
...: mean_2 = np.mean(mean_2, axis=1)
7.63 ms ± 49.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
如果不深入研究编译代码,很难说出现这种差异的原因。虽然在numpy
中“避免循环”是一种常见的性能策略,但这主要适用于“在简单任务上进行多次循环”。在复杂任务上进行少量循环可能更快。我不确定是否适用于这里,但我不会对出现这种差异感到意外。
我们还可以探讨将这两个循环放在维度的开头或结尾是否会显示出这种差异。
编辑
如果我将这两个轴移到开头或结尾,时间差异会小得多。有关将小尺寸3维度放在最后(最内层)会使您的示例速度异常缓慢的原因。
英文:
The time for 2 axes is the same as for a 1 axis on the equivalent reshape:
In [7]: timeit mean_1 = np.mean(rnd_frames, axis=(1, 2))
54.2 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [11]: timeit mean_3 = np.mean(rnd_frames.reshape(100,-1,3), axis=1)
54.5 ms ± 142 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [12]: rnd_frames.reshape(100,-1,3).shape
Out[12]: (100, 16384, 3)
As you note this is quite a bit larger than a sequential calculation:
In [13]: %%timeit
...: mean_2 = np.mean(rnd_frames, axis=1)
...: mean_2 = np.mean(mean_2, axis=1)
7.63 ms ± 49.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Without getting deep into the woods of compiled code it's hard to say why there this difference. While "avoiding loops" is a common performance strategy in numpy
, that applies mostly to "many loops on a simple task". A few loops on a complex task can be faster. I'm not sure that applies here, but I'm not surprised that there are differences like this.
We could also explore whether putting those 2 loops at the end (inner most), or beginning of the dimensions shows this difference or not.
edit
If I move the 2 axes to either the beginning, or end, the time difference is much smaller. There's something about having that small size 3 dimension at the end (inner most) that's making your example unusually slow.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论