英文:
Fast mapreduce using multiple ops
问题
对于集合 x 和 y,我想调用函数 f 并使用来自集合 red 的运算符来减少结果。
x = (1, 2, 3, 4, 5)
y = (2, 4.0, 6, 8.0, 10)
red = (/, *, +, -)
f(x, y::Int) = float(x + y)
f(x, y::AbstractFloat) = x - y
也就是说,我想要以下结果:
f(1, 2) / f(2, 4.0) * f(3, 6) + f(4, 8.0) - f(5, 10) # -32.5
以下代码有效,但会分配内存,比上面的原始代码慢大约100倍。
function mapop(f, ops, itrs...)
i::Int = 0
op(x, y) = (i += 1; ops[i](x, y))
mapreduce(f, op, itrs...)
end
mapop(f, red, x, y)
# > -32.5
@code_warntype mapop(f, red, x, y) 显示了类型不稳定性。无论如何,mapreduce 似乎会分配内存,@btime mapreduce($*, $+, $1:3, $5:7)。
问题
有没有一种快速的,最好是不分配内存的(替代/方法),使用多个函数进行 mapreduce?
以下的循环替代方法避免了30次分配,并比 mapop 运行得更快,但我无法完全消除所有的分配。
function mapop2(x, y, f, ops)
val = ntuple(i -> Float64(f(x[i], y[i)), length(x))
res = first(val)
for j in eachindex(ops)
res = ops[j](res, val[j+1])
end
res
}
英文:
For collections x and y I want to call a function f, and reduce the results using operators from a collection red.
x = (1, 2, 3, 4, 5)
y = (2, 4.0, 6, 8.0, 10)
red = (/, *, +, -)
f(x, y::Int) = float(x + y)
f(x, y::AbstractFloat) = x - y
That is, I want the result of
f(1, 2) / f(2, 4.0) * f(3, 6) + f(4, 8.0) - f(5, 10) # -32.5
The following works but allocates and is ~100x slower than the naive code above.
function mapop(f, ops, itrs...)
i::Int = 0
op(x, y) = (i += 1; ops[i](x, y))
mapreduce(f, op, itrs...)
end
mapop(f, red, x, y)
#> -32.5
@code_warntype mapop(f, red, x, y) shows type instability. mapreduce seems to allocate regardless, @btime mapreduce($*, $+, $1:3, $5:7).
Question
Is there a fast, preferably non-allocating (alternative / way) to mapreduce using multiple functions?
The following loop alternative avoids 30 allocations, and runs faster than mapop, however, I fail in getting rid of all allocations.
function mapop2(x, y, f, ops)
val = ntuple(i -> Float64(f(x[i], y[i])), length(x))
res = first(val)
for j in eachindex(ops)
res = ops[j](res, val[j+1])
end
res
end
答案1
得分: 2
另一种方法可能是:
x = (1, 2, 3, 4, 5)
y = (2, 4.0, 6, 8.0, 10)
ered = (max, /, *, +, -)
f(x, y::Int) = float(x + y)
f(x, y::AbstractFloat) = x - y
foldeval(f, r, x, y) =
foldl((r, (x, y, op)) -> op(r, f(x, y)), zip(x, y, ered); init = -Inf)
得到的结果是:
julia> foldeval(f, ered, x, y)
-32.5
julia> @btime foldeval($f, $ered, $x, $y)
646.577 ns (15 allocations: 432 bytes)
-32.5
(请注意将 red 更改为 ered 的额外操作)
英文:
Another method could be:
x = (1, 2, 3, 4, 5)
y = (2, 4.0, 6, 8.0, 10)
ered = (max, /, *, +, -)
f(x, y::Int) = float(x + y)
f(x, y::AbstractFloat) = x - y
foldeval(f, r, x, y) =
foldl((r,(x,y,op))->op(r,f(x,y)),zip(x,y,red); init=-Inf)
giving:
julia> foldeval(f, ered, x, y)
-32.5
julia> @btime foldeval($f, $ered, $x, $y)
646.577 ns (15 allocations: 432 bytes)
-32.5
(note the extra operation making red into ered)
答案2
得分: 1
将 i 作为参数传递给 op 修复了警告。
此外,一些减速可能来自于使用展开操作符 ...,它使 mapop 的形式不确定。在我的机器上,以下代码比原始版本运行速度稍快一些:
function mapop(f, ops, itr1, itr2)
i::Int = 0
op(x, y) = (i += 1; ops[i](x, y))
mapreduce(f, op, itr1, itr2)
end
英文:
Passing i as an argument to op fixes the warning.
Additionally, some of the slowdown might come from using the splat operator ..., which makes the form of mapop uncertain. On my machine, the following runs a bit more quickly than the original:
function mapop(f, ops, itr1, itr2)
i::Int = 0
op(x, y) = (i += 1; ops[i](x, y))
mapreduce(f, op, itr1, itr2)
end
答案3
得分: 0
::Function是抽象的,传递一个元组可能会变得缓慢或难以理解。- 混合使用
Int和AbstractFloat是一种代码异味。
我最终使用一个元组 red = (4, 3, 1, 2) 来表示 (/, *, +, -),并使用循环代替。
red = (4, 3, 1, 2)
function p(x, y, f, ops)
res = float(f(x[1], y[1]))
its = ntuple(i -> float(f(x[i + 1], y[i + 1])), length(ops))
for i in eachindex(its)
if ops[i] == 1
res += its[i]
elseif ops[i] == 2
res -= its[i]
elseif ops[i] == 3
res *= its[i]
elseif ops[i] == 4
res /= its[i]
end
end
res
end
@btime p($x, $y, $f, $red)
#> 7.200 ns (0 allocations: 0 bytes)
英文:
For posterity,
::Functionis abstract, passing a tuple might be slow or hard to reason with about,- mixing
IntandAbstractFloatis a code smell
I ended up using a tuple red = (4, 3, 1, 2) to represent (/, *, +, -) and a loop instead.
red = (4, 3, 1, 2)
function p(x, y, f, ops)
res = float(f(x[1], y[1]))
its = ntuple(i -> float(f(x[i + 1], y[i + 1])), length(ops))
for i in eachindex(its)
if ops[i] == 1
res += its[i]
elseif ops[i] == 2
res -= its[i]
elseif ops[i] == 3
res *= its[i]
elseif ops[i] == 4
res /= its[i]
end
end
res
end
@btime p($x, $y, $f, $red)
#> 7.200 ns (0 allocations: 0 bytes)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论