英文:
Fast mapreduce using multiple ops
问题
对于集合 x
和 y
,我想调用函数 f
并使用来自集合 red
的运算符来减少结果。
x = (1, 2, 3, 4, 5)
y = (2, 4.0, 6, 8.0, 10)
red = (/, *, +, -)
f(x, y::Int) = float(x + y)
f(x, y::AbstractFloat) = x - y
也就是说,我想要以下结果:
f(1, 2) / f(2, 4.0) * f(3, 6) + f(4, 8.0) - f(5, 10) # -32.5
以下代码有效,但会分配内存,比上面的原始代码慢大约100倍。
function mapop(f, ops, itrs...)
i::Int = 0
op(x, y) = (i += 1; ops[i](x, y))
mapreduce(f, op, itrs...)
end
mapop(f, red, x, y)
# > -32.5
@code_warntype mapop(f, red, x, y)
显示了类型不稳定性。无论如何,mapreduce
似乎会分配内存,@btime mapreduce($*, $+, $1:3, $5:7)
。
问题
有没有一种快速的,最好是不分配内存的(替代/方法),使用多个函数进行 mapreduce
?
以下的循环替代方法避免了30次分配,并比 mapop
运行得更快,但我无法完全消除所有的分配。
function mapop2(x, y, f, ops)
val = ntuple(i -> Float64(f(x[i], y[i)), length(x))
res = first(val)
for j in eachindex(ops)
res = ops[j](res, val[j+1])
end
res
}
英文:
For collections x
and y
I want to call a function f
, and reduce the results using operators from a collection red
.
x = (1, 2, 3, 4, 5)
y = (2, 4.0, 6, 8.0, 10)
red = (/, *, +, -)
f(x, y::Int) = float(x + y)
f(x, y::AbstractFloat) = x - y
That is, I want the result of
f(1, 2) / f(2, 4.0) * f(3, 6) + f(4, 8.0) - f(5, 10) # -32.5
The following works but allocates and is ~100x slower than the naive code above.
function mapop(f, ops, itrs...)
i::Int = 0
op(x, y) = (i += 1; ops[i](x, y))
mapreduce(f, op, itrs...)
end
mapop(f, red, x, y)
#> -32.5
@code_warntype mapop(f, red, x, y)
shows type instability. mapreduce
seems to allocate regardless, @btime mapreduce($*, $+, $1:3, $5:7)
.
Question
Is there a fast, preferably non-allocating (alternative / way) to mapreduce
using multiple functions?
The following loop alternative avoids 30 allocations, and runs faster than mapop
, however, I fail in getting rid of all allocations.
function mapop2(x, y, f, ops)
val = ntuple(i -> Float64(f(x[i], y[i])), length(x))
res = first(val)
for j in eachindex(ops)
res = ops[j](res, val[j+1])
end
res
end
答案1
得分: 2
另一种方法可能是:
x = (1, 2, 3, 4, 5)
y = (2, 4.0, 6, 8.0, 10)
ered = (max, /, *, +, -)
f(x, y::Int) = float(x + y)
f(x, y::AbstractFloat) = x - y
foldeval(f, r, x, y) =
foldl((r, (x, y, op)) -> op(r, f(x, y)), zip(x, y, ered); init = -Inf)
得到的结果是:
julia> foldeval(f, ered, x, y)
-32.5
julia> @btime foldeval($f, $ered, $x, $y)
646.577 ns (15 allocations: 432 bytes)
-32.5
(请注意将 red
更改为 ered
的额外操作)
英文:
Another method could be:
x = (1, 2, 3, 4, 5)
y = (2, 4.0, 6, 8.0, 10)
ered = (max, /, *, +, -)
f(x, y::Int) = float(x + y)
f(x, y::AbstractFloat) = x - y
foldeval(f, r, x, y) =
foldl((r,(x,y,op))->op(r,f(x,y)),zip(x,y,red); init=-Inf)
giving:
julia> foldeval(f, ered, x, y)
-32.5
julia> @btime foldeval($f, $ered, $x, $y)
646.577 ns (15 allocations: 432 bytes)
-32.5
(note the extra operation making red
into ered
)
答案2
得分: 1
将 i
作为参数传递给 op
修复了警告。
此外,一些减速可能来自于使用展开操作符 ...
,它使 mapop
的形式不确定。在我的机器上,以下代码比原始版本运行速度稍快一些:
function mapop(f, ops, itr1, itr2)
i::Int = 0
op(x, y) = (i += 1; ops[i](x, y))
mapreduce(f, op, itr1, itr2)
end
英文:
Passing i
as an argument to op
fixes the warning.
Additionally, some of the slowdown might come from using the splat operator ...
, which makes the form of mapop
uncertain. On my machine, the following runs a bit more quickly than the original:
function mapop(f, ops, itr1, itr2)
i::Int = 0
op(x, y) = (i += 1; ops[i](x, y))
mapreduce(f, op, itr1, itr2)
end
答案3
得分: 0
::Function
是抽象的,传递一个元组可能会变得缓慢或难以理解。- 混合使用
Int
和AbstractFloat
是一种代码异味。
我最终使用一个元组 red = (4, 3, 1, 2)
来表示 (/, *, +, -)
,并使用循环代替。
red = (4, 3, 1, 2)
function p(x, y, f, ops)
res = float(f(x[1], y[1]))
its = ntuple(i -> float(f(x[i + 1], y[i + 1])), length(ops))
for i in eachindex(its)
if ops[i] == 1
res += its[i]
elseif ops[i] == 2
res -= its[i]
elseif ops[i] == 3
res *= its[i]
elseif ops[i] == 4
res /= its[i]
end
end
res
end
@btime p($x, $y, $f, $red)
#> 7.200 ns (0 allocations: 0 bytes)
英文:
For posterity,
::Function
is abstract, passing a tuple might be slow or hard to reason with about,- mixing
Int
andAbstractFloat
is a code smell
I ended up using a tuple red = (4, 3, 1, 2)
to represent (/, *, +, -)
and a loop instead.
red = (4, 3, 1, 2)
function p(x, y, f, ops)
res = float(f(x[1], y[1]))
its = ntuple(i -> float(f(x[i + 1], y[i + 1])), length(ops))
for i in eachindex(its)
if ops[i] == 1
res += its[i]
elseif ops[i] == 2
res -= its[i]
elseif ops[i] == 3
res *= its[i]
elseif ops[i] == 4
res /= its[i]
end
end
res
end
@btime p($x, $y, $f, $red)
#> 7.200 ns (0 allocations: 0 bytes)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论