2023年1月9日 03:23:15go评论75阅读模式

英文:

Numpy speed for computing metrics (like np.mean) over multiple axes vs. single axis

问题

目前，我正在处理视频数据，因此正在对多个帧同时执行统计操作。在调试会话中，我观察到在这种情况下，对于多个轴的numpy统计（例如均值计算），与逐个分别计算每个轴相比，直接在所需轴上计算需要更长的时间。我创建了一个简单的示例来解释我的观察结果。

from timeit import default_timer as timer
import numpy as np

rnd_frames = np.random.randn(100, 128, 128, 3)
n_reps = 1000

# -----------------------------------
# 在多个轴上进行均值计算
# -----------------------------------
# 一次计算所有轴
ts = timer()
for i in range(n_reps):
    mean_1 = np.mean(rnd_frames, axis=(1, 2))
print('一次计算所有轴的均值: ', (timer()-ts)/n_reps)
# 逐个计算每个轴
ts = timer()
for i in range(n_reps):
    mean_2 = np.mean(rnd_frames, axis=1)
    mean_2 = np.mean(mean_2, axis=1)
print('逐个计算每个轴的均值: ', (timer()-ts)/n_reps)

print('均值差异: ', np.sum(np.abs(mean_1-mean_2)))

差异非常小，这是由于float64精度导致的。

是否有人能够解释这个现象？由于时间差异相当显著：逐个计算每个轴的速度快了10倍

我想知道这是否是某种bug？有人能解释一下吗？

英文:

Currently, I am working with video data and therefore I am performing statistical operations on multiple frames at once. During a debugging session I observed that the computation for numpy statistics (mean computation) in this case) over multiple axes takes longer when computed directly over the desired axes compared to computing it over each axis separately one after the other. I created a simple example to explain my observations.

from timeit import default_timer as timer
import numpy as np

rnd_frames = np.random.randn(100, 128, 128, 3)
n_reps = 1000

# -----------------------------------
# mean computation over multiple axes
# -----------------------------------
# all axes at once
ts = timer()
for i in range(n_reps):
    mean_1 = np.mean(rnd_frames, axis=(1, 2))
print(&#39;Mean all at once: &#39;, (timer()-ts)/n_reps)
# one after the other
ts = timer()
for i in range(n_reps):
    mean_2 = np.mean(rnd_frames, axis=1)
    mean_2 = np.mean(mean_2, axis=1)
print(&#39;Mean one after the other: &#39;, (timer()-ts)/n_reps)

print(&#39;Difference in means: &#39;, np.sum(np.abs(mean_1-mean_2)))

The difference is very small and results from float64 precision.

Does someone have an explanation for this? As the time differences are quite significant: One after the other is 10x faster

I wonder if this is some kind of bug? Can anyone explain this.

答案1

得分: 2

2轴的计算时间与等效重塑中的1轴相同：

In [7]: timeit mean_1 = np.mean(rnd_frames, axis=(1, 2))
54.2 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [11]: timeit mean_3 = np.mean(rnd_frames.reshape(100,-1,3), axis=1)
54.5 ms ± 142 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [12]: rnd_frames.reshape(100,-1,3).shape
Out[12]: (100, 16384, 3)

正如您所指出的，这比顺序计算要大得多：

In [13]: %%timeit
    ...: mean_2 = np.mean(rnd_frames, axis=1)
    ...: mean_2 = np.mean(mean_2, axis=1)
7.63 ms ± 49.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

如果不深入研究编译代码，很难说出现这种差异的原因。虽然在numpy中“避免循环”是一种常见的性能策略，但这主要适用于“在简单任务上进行多次循环”。在复杂任务上进行少量循环可能更快。我不确定是否适用于这里，但我不会对出现这种差异感到意外。

我们还可以探讨将这两个循环放在维度的开头或结尾是否会显示出这种差异。

编辑

如果我将这两个轴移到开头或结尾，时间差异会小得多。有关将小尺寸3维度放在最后（最内层）会使您的示例速度异常缓慢的原因。

英文:

The time for 2 axes is the same as for a 1 axis on the equivalent reshape:

In [7]: timeit mean_1 = np.mean(rnd_frames, axis=(1, 2))
54.2 ms &#177; 202 &#181;s per loop (mean &#177; std. dev. of 7 runs, 10 loops each)

In [11]: timeit mean_3 = np.mean(rnd_frames.reshape(100,-1,3), axis=1)
54.5 ms &#177; 142 &#181;s per loop (mean &#177; std. dev. of 7 runs, 10 loops each)

In [12]: rnd_frames.reshape(100,-1,3).shape
Out[12]: (100, 16384, 3)

As you note this is quite a bit larger than a sequential calculation:

In [13]: %%timeit
    ...: mean_2 = np.mean(rnd_frames, axis=1)
    ...: mean_2 = np.mean(mean_2, axis=1)
7.63 ms &#177; 49.5 &#181;s per loop (mean &#177; std. dev. of 7 runs, 100 loops each)

Without getting deep into the woods of compiled code it's hard to say why there this difference. While "avoiding loops" is a common performance strategy in numpy, that applies mostly to "many loops on a simple task". A few loops on a complex task can be faster. I'm not sure that applies here, but I'm not surprised that there are differences like this.

We could also explore whether putting those 2 loops at the end (inner most), or beginning of the dimensions shows this difference or not.

edit

If I move the 2 axes to either the beginning, or end, the time difference is much smaller. There's something about having that small size 3 dimension at the end (inner most) that's making your example unusually slow.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Numpy在多轴和单轴上计算度量（如np.mean）的速度

问题

答案1

编辑

edit

覆盖函数输入的空间复杂度

Python脚本优化，在多个文件中搜索SQL。

sympy.Sum error: IndexError: only integers, slices (`:`), ellipsis (`…`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Deep Learning with Python IMDB dataset

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论