英文:
why is Numba parallel is slower than normal python loop?
问题
以下是正常的Python循环(我从官方文档复制了示例 - https://numba.readthedocs.io/en/stable/user/parallel.html)
def two_d_array_reduction_prod(n):
shp = (13, 17)
result1 = 2 * np.ones(shp, np.int_)
tmp = 2 * np.ones_like(result1)
for i in range(n):
result1 *= tmp
return result1
我调用函数如下:
two_d_array_reduction_prod(50000)
它耗时约0.7482060070033185。
Numba并行代码
@nb.njit(parallel=True)
def two_d_array_reduction_prod(n):
shp = (13, 17)
result1 = 2 * np.ones(shp, np.int_)
tmp = 2 * np.ones_like(result1)
for i in nb.prange(n):
result1 *= tmp
return result1
我调用函数如下:
two_d_array_reduction_prod(50000)
它耗时3.9858204890042543。
我的环境:
- 亚马逊Linux 2,x86_64处理器
- 8个CPU
- 32G内存
英文:
Following is normal python loop (I copied example from official doc - https://numba.readthedocs.io/en/stable/user/parallel.html)
def two_d_array_reduction_prod(n):
shp = (13, 17)
result1 = 2 * np.ones(shp, np.int_)
tmp = 2 * np.ones_like(result1)
for i in range(n):
result1 *= tmp
return result1
I called function like:
two_d_array_reduction_prod(50000)
It takes around 0.7482060070033185.
Numba parallel code
@nb.njit(parallel=True)
def two_d_array_reduction_prod(n):
shp = (13, 17)
result1 = 2 * np.ones(shp, np.int_)
tmp = 2 * np.ones_like(result1)
for i in nb.prange(n):
result1 *= tmp
return result1
I called function like:
two_d_array_reduction_prod(50000)
It takes 3.9858204890042543
My environment:
- Amazon Linux 2, x86_64 processor
- 8 CPUs
- 32G memory
答案1
得分: 1
我无法复制这个。使用parallel=True
可以略微提高性能,但对我来说,任何方法都比纯Python快得多。
使用:
from numba import njit, prange
import numpy as np
def two_d_array_reduction_prod(n):
shp = (13, 17)
result1 = 2 * np.ones(shp, np.int_)
tmp = 2 * np.ones_like(result1)
for i in prange(n): # or for i in range(n):
result1 *= tmp
return result1
two_d_array_reduction_prod_numba = nb.njit(parallel=False)(two_d_array_reduction_prod)
即使使用parallel=False
与prange
或使用parallel=False
与range
,我也获得了超过3倍的改进。所有这些定时都是在预热后,首先编译Numba函数完成的。
英文:
I can't replicate this. Using parallel=True
gives a slight performance improvement, but any method is significantly faster compared to pure Python for me.
Using:
from numba import njit, prange
import numpy as np
def two_d_array_reduction_prod(n):
shp = (13, 17)
result1 = 2 * np.ones(shp, np.int_)
tmp = 2 * np.ones_like(result1)
for i in prange(n): # or for i in range(n):
result1 *= tmp
return result1
two_d_array_reduction_prod_numba = nb.njit(parallel=False)(two_d_array_reduction_prod)
Even with parallel=False
with prange
or using parallel=False
with range
I get over 3x improvement. All these timings are done with a warm-up, pre-compiling the Numba function first.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论