2023年2月16日 16:32:23go评论94阅读模式

英文:

why is Numba parallel is slower than normal python loop?

问题

以下是正常的Python循环（我从官方文档复制了示例 - https://numba.readthedocs.io/en/stable/user/parallel.html）

def two_d_array_reduction_prod(n):
    shp = (13, 17)
    result1 = 2 * np.ones(shp, np.int_)
    tmp = 2 * np.ones_like(result1)
    for i in range(n):
        result1 *= tmp
    return result1

我调用函数如下：

two_d_array_reduction_prod(50000)

它耗时约0.7482060070033185。

Numba并行代码

@nb.njit(parallel=True)
def two_d_array_reduction_prod(n):
    shp = (13, 17)
    result1 = 2 * np.ones(shp, np.int_)
    tmp = 2 * np.ones_like(result1)
    for i in nb.prange(n):
        result1 *= tmp
    return result1

我调用函数如下：

two_d_array_reduction_prod(50000)

它耗时3.9858204890042543。

我的环境：

亚马逊Linux 2，x86_64处理器
8个CPU
32G内存

英文:

Following is normal python loop (I copied example from official doc - https://numba.readthedocs.io/en/stable/user/parallel.html)

def two_d_array_reduction_prod(n):
    shp = (13, 17)
    result1 = 2 * np.ones(shp, np.int_)
    tmp = 2 * np.ones_like(result1)
    for i in range(n):
        result1 *= tmp
    return result1

I called function like:

two_d_array_reduction_prod(50000)

It takes around 0.7482060070033185.

Numba parallel code

@nb.njit(parallel=True)
def two_d_array_reduction_prod(n):
    shp = (13, 17)
    result1 = 2 * np.ones(shp, np.int_)
    tmp = 2 * np.ones_like(result1)
    for i in nb.prange(n):
        result1 *= tmp
    return result1

I called function like:

two_d_array_reduction_prod(50000)

It takes 3.9858204890042543

My environment:

Amazon Linux 2, x86_64 processor
8 CPUs
32G memory

答案1

得分: 1

我无法复制这个。使用parallel=True可以略微提高性能，但对我来说，任何方法都比纯Python快得多。

使用：

from numba import njit, prange
import numpy as np
def two_d_array_reduction_prod(n):
    shp = (13, 17)
    result1 = 2 * np.ones(shp, np.int_)
    tmp = 2 * np.ones_like(result1)
    for i in prange(n): # or for i in range(n):
        result1 *= tmp
    return result1
two_d_array_reduction_prod_numba = nb.njit(parallel=False)(two_d_array_reduction_prod)

即使使用parallel=False与prange或使用parallel=False与range，我也获得了超过3倍的改进。所有这些定时都是在预热后，首先编译Numba函数完成的。

英文:

I can't replicate this. Using parallel=True gives a slight performance improvement, but any method is significantly faster compared to pure Python for me.

Using:

from numba import njit, prange
import numpy as np
def two_d_array_reduction_prod(n):
    shp = (13, 17)
    result1 = 2 * np.ones(shp, np.int_)
    tmp = 2 * np.ones_like(result1)
    for i in prange(n): # or for i in range(n):
        result1 *= tmp
    return result1
two_d_array_reduction_prod_numba = nb.njit(parallel=False)(two_d_array_reduction_prod)

Even with parallel=False with prange or using parallel=False with range I get over 3x improvement. All these timings are done with a warm-up, pre-compiling the Numba function first.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么Numba并行比普通的Python循环慢？

问题

答案1

整数值的直方图，带有正确的X轴刻度和标签。

Hang during queue.join() asynchronously processing a queue

Outlook正在运行时，我的程序(win32.client)不工作。

.remove(word) 在检查单词后立即返回 “不在列表中”。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。