英文:
Why is lists faster than numpy?
问题
在这个示例中,为什么列表比NumPy快?通常情况下,NumPy被认为比使用列表更快。但在这里,列表似乎快了2倍以上?
$ python main.py
using_list: 1.8105032444000244 秒
using_numpy: 4.414405345916748 秒
# main.py
import numpy as np
import time
def using_list(N):
x = [1] * N
y = [0] * N
for i in range(1, N):
y[i] = x[i] + 0.5 * (y[i-1] - x[i])
def using_numpy(N):
x = np.ones(N)
y = np.zeros(N)
for i in range(1, N):
y[i] = x[i] + 0.5 * (y[i-1] - x[i])
def main():
LOOPS = 100
N = 100000
for func in [using_list, using_numpy]:
t = time.perf_counter()
for i in range(LOOPS):
func(N)
print(f"{func.__name__}: {time.perf_counter() - t} 秒")
if __name__=="__main__":
main()
使用Python 3.10。
供参考,我发现使用numba
的@njit
会极大地提高我的NumPy代码性能。但我想知道为什么我的普通NumPy版本比列表慢得这么多?有没有办法让我的普通NumPy版本在不使用numba
的情况下变得更快?
using_numba: 0.6930002999997669 秒
from numba import njit
@njit
def using_numba(N):
x = np.ones(N)
y = np.zeros(N)
for i in range(1, N):
y[i] = x[i] + 0.5 * (y[i-1] - x[i])
英文:
Why in this example are lists faster than numpy? numpy is usually considered faster than using lists? But here lists seem to be over 2x faster?
$ python main.py
using_list: 1.8105032444000244 seconds
using_numpy: 4.414405345916748 seconds
# main.py
import numpy as np
import time
def using_list(N):
x = [1] * N
y = [0] * N
for i in range(1, N):
y[i] = x[i] + 0.5 * (y[i-1] - x[i])
def using_numpy(N):
x = np.ones(N)
y = np.zeros(N)
for i in range(1, N):
y[i] = x[i] + 0.5 * (y[i-1] - x[i])
def main():
LOOPS = 100
N = 100000
for func in [using_list, using_numpy]:
t = time.perf_counter()
for i in range(LOOPS):
func(N)
print(f"{func.__name__}: {time.perf_counter() - t} seconds")
if __name__=="__main__":
main()
Using Python 3.10.
For reference, I have found that using @njit
from numba
will dramatically improve my numpy code. But I was wondering why my plain numpy version is so much slower than lists? And is there a way to make my plain numpy version faster without using numba
.
using_numba: 0.6930002999997669 seconds
from numba import njit
@njit
def using_numba(N):
x = np.ones(N)
y = np.zeros(N)
for i in range(1, N):
y[i] = x[i] + 0.5 * (y[i-1] - x[i])
答案1
得分: 4
使用迭代或索引逐个访问NumPy数组元素效率低下(请参考为什么在原生Python列表上的for循环比在NumPy数组上的for循环更快)。考虑使用NumPy的内置函数或方法来加速循环,这可以显著提高对大型数组的性能:
def mechanic_using_numpy(N):
y = np.empty(N, float)
y[:1] = 1.0
y[1:] = 0.5
y.cumprod(out=y)
np.subtract(1, y, out=y)
return y
>>> mechanic_using_numpy(10)
array([0. , 0.5 , 0.75 , 0.875 , 0.9375 ,
0.96875 , 0.984375 , 0.9921875 , 0.99609375, 0.99804688])
>>> np.array(using_list(10))
array([0. , 0.5 , 0.75 , 0.875 , 0.9375 ,
0.96875 , 0.984375 , 0.9921875 , 0.99609375, 0.99804688])
在N = 100000
的基准测试中:
In [_]: %timeit using_numpy(100000)
44.6 ms ± 227 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [_]: %timeit using_list(100000)
14.3 ms ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [_]: %timeit mechanic_using_numpy(100000)
286 µs ± 875 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
英文:
Iterating or using indices to access numpy arrays element by element is inefficient (please refer to Why is for loop on native python list faster than for loop on numpy array). Consider using numpy's built-in functions or methods to accelerate the loop, which can significantly improve the performance on large arrays:
def mechanic_using_numpy(N):
y = np.empty(N, float)
y[:1] = 1.0
y[1:] = 0.5
y.cumprod(out=y)
np.subtract(1, y, out=y)
return y
>>> mechanic_using_numpy(10)
array([0. , 0.5 , 0.75 , 0.875 , 0.9375 ,
0.96875 , 0.984375 , 0.9921875 , 0.99609375, 0.99804688])
>>> np.array(using_list(10))
array([0. , 0.5 , 0.75 , 0.875 , 0.9375 ,
0.96875 , 0.984375 , 0.9921875 , 0.99609375, 0.99804688])
Benchmark with N = 100000
:
In [_]: %timeit using_numpy(100000)
44.6 ms ± 227 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [_]: %timeit using_list(100000)
14.3 ms ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [_]: %timeit mechanic_using_numpy(100000)
286 µs ± 875 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
答案2
得分: 4
为了从numpy中获得好处,您需要以不同的方式进行计算。将其视为对所有值的全局操作,而不是迭代计算:
例如,这个函数给出相同的结果,但比使用列表快多达25倍:
def using_np(N):
return 1 - 0.5**(np.arange(N))
如果需要更快的速度,您可以结合更简单的操作(例如加法和乘法,而不是指数运算):
def using_np2(N):
y = np.ones(N)
y[0] = 0
y[1:] -= np.cumprod(y[1:]/2)
return y
这个函数比使用列表快40多倍。
英文:
In order to get the benefits from numpy, you need to approach the calculation differently. Think of it as a global operation on all values rather than an iterative calculation:
For example, this function gives the same result but runs up to 25 times faster than using a list:
def using_np(N):
return 1 - 0.5**(np.arange(N))
If you need more speed, you can combine simpler operations (e.g. addition & multiplication instead of exponentiation):
def using_np2(N):
y = np.ones(N)
y[0] = 0
y[1:] -= np.cumprod(y[1:]/2)
return y
This one is more than 40 times faster than using a list
答案3
得分: 2
Numpy 在能够隐式并行化操作时速度很快。如果没有 y[i-1] -> y[i]
的数据依赖关系,你可以用下面的方式替换一个循环:
for i in range(N):
y[i] = 0.5*x[i]
用
y = 0.5*x
遍历数组元素的速度不比遍历列表元素快,实际上由于需要在 Numpy 值类型和 Python 本机类型之间进行转换,速度稍慢。
例如,float
值的列表只是引用的列表。y[i]
是对列表外部存在的 float
值的引用。然而,dtype
为 float64
的数组只是表示一组浮点值的单个二进制数据块。像 y[i]
这样的操作必须提取正确的位然后使用它们来 创建 一个要返回的 float
值。一个更复杂的表达式如下:
y[i] = x[i] + 0.5 * (y[i-1] - x[i])
需要从 x
和 y
中提取 三个 这样的值,执行与列表方法相同的算术运算,然后将结果转换回字节序列以插入回 y
。
使用 y = 0.5*x
,不需要将字节序列转换为 float
值,也不需要使用 float.__mul__
,也不需要将结果转换回字节序列。所有算术运算都在 Numpy 优化的 C 代码内部完成。
英文:
Numpy is fast when it can implicitly parallelize operations. If you didn't have the y[i-1] -> y[i]
data dependency, you could replace a loop like
for i in range(N):
y[i] = 0.5*x[i]
with
y = 0.5*x
Iterating over the elements of an array is no faster than iterating over the elements of a list, and is in fact slightly slower due to the need to convert back and forth between Numpy value types and Python native types.
For example, a list of float
values is just a list of references. y[i]
is a reference to a float
value that exists outside the list. An array with dtype
of float64
, however, is just a single binary blob that represents a collection of floating-point values. An operation like y[i]
has to extract the proper bits then use them to create a float
value to return. A more complicated expression like
y[i] = x[i] + 0.5 * (y[i-1] - x[i])
needs to extract three such values from x
and y
, do the same arithmetic you would do with the list approach, then convert the result back to a byte sequence to insert back into y
.
With y = 0.5*x
, there are no conversions of byte sequences to float
values, use of float.__mul__
, or conversions back to bytes sequences. All arithmetic is done internally in Numpy's optimized C code.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论