2023年2月23日 23:41:12go评论70阅读模式

英文:

Fast mapreduce using multiple ops

问题

对于集合 x 和 y，我想调用函数 f 并使用来自集合 red 的运算符来减少结果。

x = (1, 2, 3, 4, 5)
y = (2, 4.0, 6, 8.0, 10)
red = (/, *, +, -)

f(x, y::Int) = float(x + y)
f(x, y::AbstractFloat) = x - y

也就是说，我想要以下结果：

f(1, 2) / f(2, 4.0) * f(3, 6) + f(4, 8.0) - f(5, 10)   # -32.5

以下代码有效，但会分配内存，比上面的原始代码慢大约100倍。

function mapop(f, ops, itrs...)
    i::Int = 0
    op(x, y) = (i += 1; ops[i](x, y))
    mapreduce(f, op, itrs...)
end

mapop(f, red, x, y)
# > -32.5

@code_warntype mapop(f, red, x, y) 显示了类型不稳定性。无论如何，mapreduce 似乎会分配内存，@btime mapreduce($*, $+, $1:3, $5:7)。

问题
有没有一种快速的，最好是不分配内存的（替代/方法），使用多个函数进行 mapreduce？

以下的循环替代方法避免了30次分配，并比 mapop 运行得更快，但我无法完全消除所有的分配。

function mapop2(x, y, f, ops)
    val = ntuple(i -> Float64(f(x[i], y[i)), length(x))
    res = first(val)
    for j in eachindex(ops)
        res = ops[j](res, val[j+1])
    end
    res
}

英文:

For collections x and y I want to call a function f, and reduce the results using operators from a collection red.

x = (1, 2, 3, 4, 5)
y = (2, 4.0, 6, 8.0, 10)
red = (/, *, +, -)

f(x, y::Int) = float(x + y)
f(x, y::AbstractFloat) = x - y

That is, I want the result of

f(1, 2) / f(2, 4.0) * f(3, 6) + f(4, 8.0) - f(5, 10)   # -32.5

The following works but allocates and is ~100x slower than the naive code above.

function mapop(f, ops, itrs...)
    i::Int = 0
    op(x, y) = (i += 1; ops[i](x, y))
    mapreduce(f, op, itrs...)
end

mapop(f, red, x, y)
#&gt; -32.5

@code_warntype mapop(f, red, x, y) shows type instability. mapreduce seems to allocate regardless, @btime mapreduce($*, $+, $1:3, $5:7).

Question
Is there a fast, preferably non-allocating (alternative / way) to mapreduce using multiple functions?

The following loop alternative avoids 30 allocations, and runs faster than mapop, however, I fail in getting rid of all allocations.

function mapop2(x, y, f, ops)
    val = ntuple(i -&gt; Float64(f(x[i], y[i])), length(x))
    res = first(val)
    for j in eachindex(ops)
        res = ops[j](res, val[j+1])
    end
    res
end

答案1

得分: 2

另一种方法可能是：

x = (1, 2, 3, 4, 5)
y = (2, 4.0, 6, 8.0, 10)
ered = (max, /, *, +, -)

f(x, y::Int) = float(x + y)
f(x, y::AbstractFloat) = x - y

foldeval(f, r, x, y) = 
  foldl((r, (x, y, op)) -> op(r, f(x, y)), zip(x, y, ered); init = -Inf)

得到的结果是：

julia> foldeval(f, ered, x, y)
-32.5

julia> @btime foldeval($f, $ered, $x, $y)
  646.577 ns (15 allocations: 432 bytes)
-32.5

（请注意将 red 更改为 ered 的额外操作）

英文:

Another method could be:

x = (1, 2, 3, 4, 5)
y = (2, 4.0, 6, 8.0, 10)
ered = (max, /, *, +, -)

f(x, y::Int) = float(x + y)
f(x, y::AbstractFloat) = x - y

foldeval(f, r, x, y) = 
  foldl((r,(x,y,op))-&gt;op(r,f(x,y)),zip(x,y,red); init=-Inf)

giving:

julia&gt; foldeval(f, ered, x, y)
-32.5

julia&gt; @btime foldeval($f, $ered, $x, $y)
  646.577 ns (15 allocations: 432 bytes)
-32.5

(note the extra operation making red into ered)

答案2

得分: 1

将 i 作为参数传递给 op 修复了警告。

此外，一些减速可能来自于使用展开操作符 ...，它使 mapop 的形式不确定。在我的机器上，以下代码比原始版本运行速度稍快一些：

function mapop(f, ops, itr1, itr2)
    i::Int = 0
    op(x, y) = (i += 1; ops[i](x, y))
    mapreduce(f, op, itr1, itr2)
end

英文:

Passing i as an argument to op fixes the warning.

Additionally, some of the slowdown might come from using the splat operator ..., which makes the form of mapop uncertain. On my machine, the following runs a bit more quickly than the original:

function mapop(f, ops, itr1, itr2)
           i::Int = 0
           op(x, y) = (i += 1; ops[i](x, y))
           mapreduce(f, op, itr1, itr2)
       end

答案3

得分: 0

::Function 是抽象的，传递一个元组可能会变得缓慢或难以理解。
混合使用 Int 和 AbstractFloat 是一种代码异味。

我最终使用一个元组 red = (4, 3, 1, 2) 来表示 (/, *, +, -)，并使用循环代替。

red = (4, 3, 1, 2)
function p(x, y, f, ops)
    res = float(f(x[1], y[1]))
    its = ntuple(i -> float(f(x[i + 1], y[i + 1])), length(ops))
    for i in eachindex(its)
        if ops[i] == 1
            res += its[i]
        elseif ops[i] == 2
            res -= its[i]
        elseif ops[i] == 3
            res *= its[i]
        elseif ops[i] == 4
            res /= its[i]
        end
    end
    res
end

@btime p($x, $y, $f, $red)
#> 7.200 ns (0 allocations: 0 bytes)

英文:

For posterity,

::Function is abstract, passing a tuple might be slow or hard to reason with about,
mixing Int and AbstractFloat is a code smell

I ended up using a tuple red = (4, 3, 1, 2) to represent (/, *, +, -) and a loop instead.

red = (4, 3, 1, 2)
function p(x, y, f, ops)
    res = float(f(x[1], y[1]))
    its = ntuple(i -&gt; float(f(x[i + 1], y[i + 1])), length(ops))
    for i in eachindex(its)
        if ops[i] == 1
            res += its[i]
        elseif ops[i] == 2
            res -= its[i]
        elseif ops[i] == 3
            res *= its[i]
        elseif ops[i] == 4
            res /= its[i]
        end
    end
    res
end

@btime p($x, $y, $f, $red)
#&gt; 7.200 ns (0 allocations: 0 bytes)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

快速使用多个操作的MapReduce

问题

答案1

答案2

答案3

阻止滚动时JScrollPane的大量重绘

Clang能够被说服优化这个几乎是叶子函数吗？

Parallel.For一直只使用CPU的10%。

最佳的数据结构用于存储具有字符串键和布尔辅助值的对象是什么？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论