快速使用多个操作的MapReduce

huangapple go评论62阅读模式
英文:

Fast mapreduce using multiple ops

问题

对于集合 xy,我想调用函数 f 并使用来自集合 red 的运算符来减少结果。

x = (1, 2, 3, 4, 5)
y = (2, 4.0, 6, 8.0, 10)
red = (/, *, +, -)

f(x, y::Int) = float(x + y)
f(x, y::AbstractFloat) = x - y

也就是说,我想要以下结果:

f(1, 2) / f(2, 4.0) * f(3, 6) + f(4, 8.0) - f(5, 10)   # -32.5

以下代码有效,但会分配内存,比上面的原始代码慢大约100倍。

function mapop(f, ops, itrs...)
    i::Int = 0
    op(x, y) = (i += 1; ops[i](x, y))
    mapreduce(f, op, itrs...)
end

mapop(f, red, x, y)
# > -32.5

@code_warntype mapop(f, red, x, y) 显示了类型不稳定性。无论如何,mapreduce 似乎会分配内存,@btime mapreduce($*, $+, $1:3, $5:7)

问题
有没有一种快速的,最好是不分配内存的(替代/方法),使用多个函数进行 mapreduce

以下的循环替代方法避免了30次分配,并比 mapop 运行得更快,但我无法完全消除所有的分配。

function mapop2(x, y, f, ops)
    val = ntuple(i -> Float64(f(x[i], y[i)), length(x))
    res = first(val)
    for j in eachindex(ops)
        res = ops[j](res, val[j+1])
    end
    res
}
英文:

For collections x and y I want to call a function f, and reduce the results using operators from a collection red.

x = (1, 2, 3, 4, 5)
y = (2, 4.0, 6, 8.0, 10)
red = (/, *, +, -)

f(x, y::Int) = float(x + y)
f(x, y::AbstractFloat) = x - y

That is, I want the result of

f(1, 2) / f(2, 4.0) * f(3, 6) + f(4, 8.0) - f(5, 10)   # -32.5

The following works but allocates and is ~100x slower than the naive code above.

function mapop(f, ops, itrs...)
    i::Int = 0
    op(x, y) = (i += 1; ops[i](x, y))
    mapreduce(f, op, itrs...)
end

mapop(f, red, x, y)
#> -32.5

@code_warntype mapop(f, red, x, y) shows type instability. mapreduce seems to allocate regardless, @btime mapreduce($*, $+, $1:3, $5:7).

Question
Is there a fast, preferably non-allocating (alternative / way) to mapreduce using multiple functions?

The following loop alternative avoids 30 allocations, and runs faster than mapop, however, I fail in getting rid of all allocations.

function mapop2(x, y, f, ops)
    val = ntuple(i -> Float64(f(x[i], y[i])), length(x))
    res = first(val)
    for j in eachindex(ops)
        res = ops[j](res, val[j+1])
    end
    res
end

答案1

得分: 2

另一种方法可能是:

x = (1, 2, 3, 4, 5)
y = (2, 4.0, 6, 8.0, 10)
ered = (max, /, *, +, -)

f(x, y::Int) = float(x + y)
f(x, y::AbstractFloat) = x - y

foldeval(f, r, x, y) = 
  foldl((r, (x, y, op)) -> op(r, f(x, y)), zip(x, y, ered); init = -Inf)

得到的结果是:

julia> foldeval(f, ered, x, y)
-32.5

julia> @btime foldeval($f, $ered, $x, $y)
  646.577 ns (15 allocations: 432 bytes)
-32.5

(请注意将 red 更改为 ered 的额外操作)

英文:

Another method could be:

x = (1, 2, 3, 4, 5)
y = (2, 4.0, 6, 8.0, 10)
ered = (max, /, *, +, -)

f(x, y::Int) = float(x + y)
f(x, y::AbstractFloat) = x - y

foldeval(f, r, x, y) = 
  foldl((r,(x,y,op))->op(r,f(x,y)),zip(x,y,red); init=-Inf)

giving:

julia> foldeval(f, ered, x, y)
-32.5

julia> @btime foldeval($f, $ered, $x, $y)
  646.577 ns (15 allocations: 432 bytes)
-32.5

(note the extra operation making red into ered)

答案2

得分: 1

i 作为参数传递给 op 修复了警告。

此外,一些减速可能来自于使用展开操作符 ...,它使 mapop 的形式不确定。在我的机器上,以下代码比原始版本运行速度稍快一些:

function mapop(f, ops, itr1, itr2)
    i::Int = 0
    op(x, y) = (i += 1; ops[i](x, y))
    mapreduce(f, op, itr1, itr2)
end
英文:

Passing i as an argument to op fixes the warning.

Additionally, some of the slowdown might come from using the splat operator ..., which makes the form of mapop uncertain. On my machine, the following runs a bit more quickly than the original:

function mapop(f, ops, itr1, itr2)
           i::Int = 0
           op(x, y) = (i += 1; ops[i](x, y))
           mapreduce(f, op, itr1, itr2)
       end

答案3

得分: 0

  • ::Function 是抽象的,传递一个元组可能会变得缓慢或难以理解。
  • 混合使用 IntAbstractFloat 是一种代码异味。

我最终使用一个元组 red = (4, 3, 1, 2) 来表示 (/, *, +, -),并使用循环代替。

red = (4, 3, 1, 2)
function p(x, y, f, ops)
    res = float(f(x[1], y[1]))
    its = ntuple(i -> float(f(x[i + 1], y[i + 1])), length(ops))
    for i in eachindex(its)
        if ops[i] == 1
            res += its[i]
        elseif ops[i] == 2
            res -= its[i]
        elseif ops[i] == 3
            res *= its[i]
        elseif ops[i] == 4
            res /= its[i]
        end
    end
    res
end

@btime p($x, $y, $f, $red)
#> 7.200 ns (0 allocations: 0 bytes)
英文:

For posterity,

  • ::Function is abstract, passing a tuple might be slow or hard to reason with about,
  • mixing Int and AbstractFloat is a code smell

I ended up using a tuple red = (4, 3, 1, 2) to represent (/, *, +, -) and a loop instead.

red = (4, 3, 1, 2)
function p(x, y, f, ops)
    res = float(f(x[1], y[1]))
    its = ntuple(i -> float(f(x[i + 1], y[i + 1])), length(ops))
    for i in eachindex(its)
        if ops[i] == 1
            res += its[i]
        elseif ops[i] == 2
            res -= its[i]
        elseif ops[i] == 3
            res *= its[i]
        elseif ops[i] == 4
            res /= its[i]
        end
    end
    res
end

@btime p($x, $y, $f, $red)
#> 7.200 ns (0 allocations: 0 bytes)

huangapple
  • 本文由 发表于 2023年2月23日 23:41:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/75547080.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定