为什么Numba并行比普通的Python循环慢?

huangapple go评论60阅读模式
英文:

why is Numba parallel is slower than normal python loop?

问题

以下是正常的Python循环(我从官方文档复制了示例 - https://numba.readthedocs.io/en/stable/user/parallel.html

def two_d_array_reduction_prod(n):
    shp = (13, 17)
    result1 = 2 * np.ones(shp, np.int_)
    tmp = 2 * np.ones_like(result1)

    for i in range(n):
        result1 *= tmp

    return result1

我调用函数如下:

two_d_array_reduction_prod(50000)

它耗时约0.7482060070033185。

Numba并行代码

@nb.njit(parallel=True)
def two_d_array_reduction_prod(n):
    shp = (13, 17)
    result1 = 2 * np.ones(shp, np.int_)
    tmp = 2 * np.ones_like(result1)

    for i in nb.prange(n):
        result1 *= tmp

    return result1

我调用函数如下:

two_d_array_reduction_prod(50000)

它耗时3.9858204890042543。

我的环境:

  1. 亚马逊Linux 2,x86_64处理器
  2. 8个CPU
  3. 32G内存
英文:

Following is normal python loop (I copied example from official doc - https://numba.readthedocs.io/en/stable/user/parallel.html)

def two_d_array_reduction_prod(n):
    shp = (13, 17)
    result1 = 2 * np.ones(shp, np.int_)
    tmp = 2 * np.ones_like(result1)

    for i in range(n):
        result1 *= tmp

    return result1

I called function like:

two_d_array_reduction_prod(50000)

It takes around 0.7482060070033185.

Numba parallel code

@nb.njit(parallel=True)
def two_d_array_reduction_prod(n):
    shp = (13, 17)
    result1 = 2 * np.ones(shp, np.int_)
    tmp = 2 * np.ones_like(result1)

    for i in nb.prange(n):
        result1 *= tmp

    return result1

I called function like:

two_d_array_reduction_prod(50000)

It takes 3.9858204890042543

My environment:

  1. Amazon Linux 2, x86_64 processor
  2. 8 CPUs
  3. 32G memory

答案1

得分: 1

我无法复制这个。使用parallel=True可以略微提高性能,但对我来说,任何方法都比纯Python快得多。

使用:

from numba import njit, prange
import numpy as np

def two_d_array_reduction_prod(n):
    shp = (13, 17)
    result1 = 2 * np.ones(shp, np.int_)
    tmp = 2 * np.ones_like(result1)

    for i in prange(n): # or for i in range(n):
        result1 *= tmp

    return result1

two_d_array_reduction_prod_numba = nb.njit(parallel=False)(two_d_array_reduction_prod)

即使使用parallel=Falseprange或使用parallel=Falserange,我也获得了超过3倍的改进。所有这些定时都是在预热后,首先编译Numba函数完成的。

为什么Numba并行比普通的Python循环慢?

英文:

I can't replicate this. Using parallel=True gives a slight performance improvement, but any method is significantly faster compared to pure Python for me.

Using:

from numba import njit, prange
import numpy as np

def two_d_array_reduction_prod(n):
    shp = (13, 17)
    result1 = 2 * np.ones(shp, np.int_)
    tmp = 2 * np.ones_like(result1)

    for i in prange(n): # or for i in range(n):
        result1 *= tmp

    return result1

two_d_array_reduction_prod_numba = nb.njit(parallel=False)(two_d_array_reduction_prod)

Even with parallel=False with prange or using parallel=False with range I get over 3x improvement. All these timings are done with a warm-up, pre-compiling the Numba function first.

为什么Numba并行比普通的Python循环慢?

huangapple
  • 本文由 发表于 2023年2月16日 16:32:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75469596.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定