Go运行时是否在每次迭代中评估for循环条件?

huangapple go评论83阅读模式
英文:

Does Go runtime evaluate the for loop condition every iteration?

问题

这是《Go语言程序设计》一书中的一段代码:

for t := 0.0; t < cycles*2*math.Pi; t += res {
    ...
}

在这个for循环的条件中,表达式t < cycles*2*math.Pi似乎在每次迭代之前都需要进行计算。或者,编译器是否通过预先计算表达式的结果(假设在迭代过程中没有变量改变)来进行优化?上述的编码风格是否会影响性能?

英文:

Here is a code snippet from the book "The Go Programming Language":

for t := 0.0; t &lt; cycles*2*math.Pi; t += res {
    ...
}

It appears that the expression in the for loop condition t &lt; cycles*2*math.Pi must be evaluated before every iteration of the for loop. Or, does the compiler optimize this by pre-calculating the result of the expression (assuming none of the variables change during the iteration)? Does the above style of coding affect performance?

答案1

得分: 5

这实际上取决于Go的版本,但是go version go1.7 windows/amd64似乎只计算一次值。

Go代码:

var cycles = 10.0
var res = 1000.0
for t := 0.0; t < cycles*2*math.Pi; t += res {
}

汇编代码:

movsd   [rsp+58h+var_20], xmm0
mov     [rsp+58h+var_18], 0
mov     [rsp+58h+var_10], 0
lea     rax, qword_494500
mov     [rsp+58h+var_58], rax
lea     rax, [rsp+58h+var_20]
mov     [rsp+58h+var_50], rax
mov     [rsp+58h+var_48], 0
call    runtime_convT2E
mov     rax, [rsp+58h+var_40]
mov     rcx, [rsp+58h+a] ; a
mov     [rsp+58h+var_18], rax
mov     [rsp+58h+var_10], rcx
lea     rax, [rsp+58h+var_18]
mov     [rsp+58h+var_58], rax
mov     [rsp+58h+var_50], 1
mov     [rsp+58h+var_48], 1
call    fmt_Println
movsd   xmm0, cs:$f64_408f400000000000
movsd   xmm1, [rsp+58h+t]
addsd   xmm0, xmm1
movsd   [rsp+58h+t], xmm0
movsd   xmm1, cs:$f64_404f6a7a2955385e
ucomisd xmm1, xmm0
ja      loc_401083

f64_404f6a7a2955385e是一个预先计算的双精度值,等于10 * 2 * math.Pi62.8318530718

Go编译器最近切换到了SSA,因此这些优化将会不断改进,因为它们从中获益匪浅。目前,SSA仅在amd64上可用:

> ### 编译器工具链
>
> 此版本包括了一个新的代码生成后端,用于64位x86系统,该提案自2015年以来一直在开发中。基于SSA的新后端生成更紧凑、更高效的代码,并为诸如边界检查消除等优化提供了更好的平台。

1.8版本应该适用于所有支持的架构

> ### 编译器工具链
>
> Go 1.7引入了一个新的编译器后端,用于64位x86系统。在Go 1.8中,该后端得到了进一步的开发,并且现在用于所有架构。

英文:

This really depends on the Go version but go version go1.7 windows/amd64 does appear to calculate the value once.

Go code:

var cycles = 10.0
var res = 1000.0
for t := 0.0; t &lt; cycles*2*math.Pi; t += res {
}

Asm code:

movsd   [rsp+58h+var_20], xmm0
mov     [rsp+58h+var_18], 0
mov     [rsp+58h+var_10], 0
lea     rax, qword_494500
mov     [rsp+58h+var_58], rax
lea     rax, [rsp+58h+var_20]
mov     [rsp+58h+var_50], rax
mov     [rsp+58h+var_48], 0
call    runtime_convT2E
mov     rax, [rsp+58h+var_40]
mov     rcx, [rsp+58h+a] ; a
mov     [rsp+58h+var_18], rax
mov     [rsp+58h+var_10], rcx
lea     rax, [rsp+58h+var_18]
mov     [rsp+58h+var_58], rax
mov     [rsp+58h+var_50], 1
mov     [rsp+58h+var_48], 1
call    fmt_Println
movsd   xmm0, cs:$f64_408f400000000000
movsd   xmm1, [rsp+58h+t]
addsd   xmm0, xmm1
movsd   [rsp+58h+t], xmm0
movsd   xmm1, cs:$f64_404f6a7a2955385e
ucomisd xmm1, xmm0
ja      loc_401083

f64_404f6a7a2955385e is a precalculated double value equal to 10 * 2 * math.Pi or 62.8318530718

Go compiler recently switched to SSA, so these kind of optimizations will just keep improving as they greatly benefit from it. For now SSA is only available on amd64:

> ### Compiler Toolchain
>
> This release includes a new code generation back end for 64-bit x86
> systems, following a proposal from 2015 that has been under development
> since then. The new back end, based on SSA, generates more compact, more
> efficient code and provides a better platform for optimizations such as
> bounds check elimination.

1.8 should have it for all supported architectures:

> ### Compiler Toolchain
>
> Go 1.7 introduced a new compiler back end for 64-bit x86 systems. In Go
> 1.8, that back end has been developed further and is now used for all
> architectures.

答案2

得分: 1

它似乎没有进行优化,并且没有列在“编译器和运行时优化”中。

正如在这个较旧的讨论中提到的,

> gc编译器不进行任何循环优化。该编译器的主要目标之一是快速编译。虽然改进优化总是有用的,但它必须符合这个目标。

但这可能已经改变了。在“Go中的递归和尾调用”中,展示了一种查看go程序生成的汇编代码的好技术。
还可以参考更近期的2016年的“像专业人士一样反向GO二进制文件”文章。
还有“go.godbolt.org”也可以帮助:在这里可以查看汇编代码

你可以看到“t &lt; cycles*2*math.Pi”部分始终被评估。

.L2:
        movsd   xmm0, QWORD PTR [rbp-24]
        addsd   xmm0, xmm0
        movsd   xmm1, QWORD PTR .LC2[rip]
        mulsd   xmm0, xmm1
        ucomisd xmm0, QWORD PTR [rbp-8]
        seta    al
        test    al, al
英文:

It does not seems optimized, and is not listed in "Compiler And Runtime Optimizations".

As mentioned in this older discussion,

> The gc compiler does not do any loop optimizations. One of the major goals of that compiler is to compile quickly. While improved optimization is always useful, it has to fit within that goal

But that could have changed. A good technique to see what is going on is illustrated in "Recursion And Tail Calls In Go", where you can look at the assembly code produced by a go program.
See also the more recent 2016 "Reversing GO binaries like a pro" article.
And "go.godbolt.org" can help too: see the assembly code here.

You can see the "t &lt; cycles*2*math.Pi" part always evaluated

.L2:
        movsd   xmm0, QWORD PTR [rbp-24]
        addsd   xmm0, xmm0
        movsd   xmm1, QWORD PTR .LC2[rip]
        mulsd   xmm0, xmm1
        ucomisd xmm0, QWORD PTR [rbp-8]
        seta    al
        test    al, al

答案3

得分: 1

当前的Go编译器不会将循环不变计算移到循环外部。

编译器的通行证列表可以在这里查看:https://github.com/golang/go/blob/master/src/cmd/compile/internal/ssa/compile.go#L329

在@creker的示例中,编译器进行了常量折叠,而不是循环不变代码移动。

顺便提一下,几个月前我为Go编译器做了一个LICM通行证,链接在这里:https://github.com/golang/go/compare/master...momchil-velikov:dev.chill.licm

在通常使用的Go基准测试中,这并没有显著提高性能。(我怪罪于糟糕的寄存器分配 :P)

英文:

The current Go compiler does not move loop invariant computations outside loops.

The list of passes of the compiler can be seen here https://github.com/golang/go/blob/master/src/cmd/compile/internal/ssa/compile.go#L329

In the example by @creker, the compiler did constant folding, not loop invariant code motion.

As a side note, I did make a few months ago a LICM pass for the Go compiler https://github.com/golang/go/compare/master...momchil-velikov:dev.chill.licm

which does not improve performance very much on the typically used Go benchmarks. (I blame the atrocious register allocation :P)

huangapple
  • 本文由 发表于 2016年12月26日 15:39:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/41327984.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定