英文:
Overhead when using vector-valued functions in Julia
问题
I have noticed that, in Julia, the execution speed of my code slows down dramatically when using vector-valued functions. Consider the following MWE:
using BenchmarkTools
function f_scalar(x::Float64)::Float64
return -x
end
function f_vector(x::Float64)::Array{Float64,1}
return [-x,0.0]
end
function trajectory_scalar(x0::Float64,t::Float64,dt::Float64)
x = x0
nsteps = convert(Int,t/dt)
xvec = zeros(nsteps)
for k in 1:nsteps
x = x + f_scalar(x)*dt
xvec[k] = x
end
return xvec
end
function trajectory_vector(x0::Float64,t::Float64,dt::Float64)
x = x0
nsteps = convert(Int,t/dt)
xvec = zeros(nsteps)
for k in 1:nsteps
x = x + f_vector(x)[1]*dt
xvec[k] = x
end
return xvec
end
@btime trajectory_scalar(2.0,10.0,0.01) #1.140 \mu s (1 allocation: 7.94 KiB)
@btime trajectory_vector(2.0,10.0,0.01) #12.800 \mu s (1001 allocations: 86.06 KiB)
The code involving the vector-valued function is an order of magnitude slower. I guess this is due to greatly increased amount of memory allocations; it seems that a new vector is allocated every time the function is called.
If so, is there any way of avoiding this overhead and still use vector-valued functions? In the example above, the vector-valued function is obviously not needed, but, in practice, I want to perform simulations of vector-valued stochastic differential equations, so it would be very convenient to actually write the drift vector as a vector and not having to call a different function for every component.
EDIT: Using August's comment, the vectorized version indeed becomes as fast as the non-vectorized one when using StaticArrays:
function f_vector(x::Float64)::SVector{2,Float64}
return SVector(-x,0.0)
end
@btime trajectory_vector(2.0,10.0,0.01) # 1.200 \mu s (1 allocation: 7.94 KiB)
英文:
I have noticed that, in Julia, the execution speed of my code slows down dramatically when using vector-valued functions. Consider the following MWE:
using BenchmarkTools
function f_scalar(x::Float64)::Float64
return -x
end
function f_vector(x::Float64)::Array{Float64,1}
return [-x,0.0]
end
function trajectory_scalar(x0::Float64,t::Float64,dt::Float64)
x = x0
nsteps = convert(Int,t/dt)
xvec = zeros(nsteps)
for k in 1:nsteps
x = x + f_scalar(x)*dt
xvec[k] = x
end
return xvec
end
function trajectory_vector(x0::Float64,t::Float64,dt::Float64)
x = x0
nsteps = convert(Int,t/dt)
xvec = zeros(nsteps)
for k in 1:nsteps
x = x + f_vector(x)[1]*dt
xvec[k] = x
end
return xvec
end
@btime trajectory_scalar(2.0,10.0,0.01) #1.140 \mu s (1 allocation: 7.94 KiB)
@btime trajectory_vector(2.0,10.0,0.01) #12.800 \mu s (1001 allocations: 86.06 KiB)
The code involving the vector-valued function is an order of magnitude slower. I guess this is due to greatly increased amount of memory allocations; it seems that a new vector is allocated every time the function is called.
If so, is there any way of avoiding this overhead and still use vector-valued functions? In the example above, the vector-valued function is obviously not needed, but, in practice, I want to perform simulations of vector-valued stochastic differential equations, so it would be very convenient to actually write the drift vector as a vector and not having to call a different function for every component.
EDIT: Using August's comment, the vectorized version indeed becomes as fast as the non-vectorized one when using StaticArrays:
function f_vector(x::Float64)::SVector{2,Float64}
return SVector(-x,0.0)
end
@btime trajectory_vector(2.0,10.0,0.01) # 1.200 \mu s (1 allocation: 7.94 KiB)
答案1
得分: 1
使用StaticArray
(来自StaticArrays.jl)来处理小型固定大小的向量,就像这样的向量(即在编译时已知大小的向量)。
由于大小在编译时已知,编译器可以将StaticArrays放在堆栈或寄存器中,而不是进行堆分配(对于具有运行时大小的Vector
来说,堆分配是必需的)。
请注意,您的代码无法运行,因为d
未定义。希望它不是全局变量!未类型化/非常量全局变量速度较慢,在性能关键的代码中应避免使用。如果我使用const d = 100
然后它可以运行,如果我将f_vector
更改为:
using StaticArrays
function f_vector(x::Float64)
return SA[-x, 0.0]
end
那么trajectory_vector
以基本相同的速度运行如同trajectory_scalar
。
请注意,所有的参数和返回值类型声明对性能无任何帮助。这不是Julia的工作方式;它只会使您的代码不够灵活。
英文:
Use a StaticArray
(from the StaticArrays.jl) for small fixed-size vectors like this (i.e. vectors whose size is known at compile-time).
The fact that the size is known at compile time allows the compile to put StaticArrays on the stack or in registers rather than doing a heap allocation (required for something like Vector
with a runtime size).
Note that your code is not runnable because d
is not defined. I hope it's not a global! Untyped/nonconstant global variables are slow and should be avoided in performance-critical code. If I use const d = 100
then it runs, and if I change f_vector
to:
using StaticArrays
function f_vector(x::Float64)
return SA[-x,0.0]
end
then trajectory_vector
runs at the basically the same speed as trajectory_scalar
.
Note that all of your argument and return type declarations do nothing for performance. That's now how Julia works; it just makes your code less flexible.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论