英文:
Does Go runtime evaluate the for loop condition every iteration?
问题
这是《Go语言程序设计》一书中的一段代码:
for t := 0.0; t < cycles*2*math.Pi; t += res {
...
}
在这个for循环的条件中,表达式t < cycles*2*math.Pi
似乎在每次迭代之前都需要进行计算。或者,编译器是否通过预先计算表达式的结果(假设在迭代过程中没有变量改变)来进行优化?上述的编码风格是否会影响性能?
英文:
Here is a code snippet from the book "The Go Programming Language":
for t := 0.0; t < cycles*2*math.Pi; t += res {
...
}
It appears that the expression in the for loop condition t < cycles*2*math.Pi
must be evaluated before every iteration of the for loop. Or, does the compiler optimize this by pre-calculating the result of the expression (assuming none of the variables change during the iteration)? Does the above style of coding affect performance?
答案1
得分: 5
这实际上取决于Go的版本,但是go version go1.7 windows/amd64
似乎只计算一次值。
Go代码:
var cycles = 10.0
var res = 1000.0
for t := 0.0; t < cycles*2*math.Pi; t += res {
}
汇编代码:
movsd [rsp+58h+var_20], xmm0
mov [rsp+58h+var_18], 0
mov [rsp+58h+var_10], 0
lea rax, qword_494500
mov [rsp+58h+var_58], rax
lea rax, [rsp+58h+var_20]
mov [rsp+58h+var_50], rax
mov [rsp+58h+var_48], 0
call runtime_convT2E
mov rax, [rsp+58h+var_40]
mov rcx, [rsp+58h+a] ; a
mov [rsp+58h+var_18], rax
mov [rsp+58h+var_10], rcx
lea rax, [rsp+58h+var_18]
mov [rsp+58h+var_58], rax
mov [rsp+58h+var_50], 1
mov [rsp+58h+var_48], 1
call fmt_Println
movsd xmm0, cs:$f64_408f400000000000
movsd xmm1, [rsp+58h+t]
addsd xmm0, xmm1
movsd [rsp+58h+t], xmm0
movsd xmm1, cs:$f64_404f6a7a2955385e
ucomisd xmm1, xmm0
ja loc_401083
f64_404f6a7a2955385e
是一个预先计算的双精度值,等于10 * 2 * math.Pi
或62.8318530718
。
Go编译器最近切换到了SSA,因此这些优化将会不断改进,因为它们从中获益匪浅。目前,SSA仅在amd64上可用:
> ### 编译器工具链
>
> 此版本包括了一个新的代码生成后端,用于64位x86系统,该提案自2015年以来一直在开发中。基于SSA的新后端生成更紧凑、更高效的代码,并为诸如边界检查消除等优化提供了更好的平台。
> ### 编译器工具链
>
> Go 1.7引入了一个新的编译器后端,用于64位x86系统。在Go 1.8中,该后端得到了进一步的开发,并且现在用于所有架构。
英文:
This really depends on the Go version but go version go1.7 windows/amd64
does appear to calculate the value once.
Go code:
var cycles = 10.0
var res = 1000.0
for t := 0.0; t < cycles*2*math.Pi; t += res {
}
Asm code:
movsd [rsp+58h+var_20], xmm0
mov [rsp+58h+var_18], 0
mov [rsp+58h+var_10], 0
lea rax, qword_494500
mov [rsp+58h+var_58], rax
lea rax, [rsp+58h+var_20]
mov [rsp+58h+var_50], rax
mov [rsp+58h+var_48], 0
call runtime_convT2E
mov rax, [rsp+58h+var_40]
mov rcx, [rsp+58h+a] ; a
mov [rsp+58h+var_18], rax
mov [rsp+58h+var_10], rcx
lea rax, [rsp+58h+var_18]
mov [rsp+58h+var_58], rax
mov [rsp+58h+var_50], 1
mov [rsp+58h+var_48], 1
call fmt_Println
movsd xmm0, cs:$f64_408f400000000000
movsd xmm1, [rsp+58h+t]
addsd xmm0, xmm1
movsd [rsp+58h+t], xmm0
movsd xmm1, cs:$f64_404f6a7a2955385e
ucomisd xmm1, xmm0
ja loc_401083
f64_404f6a7a2955385e
is a precalculated double value equal to 10 * 2 * math.Pi
or 62.8318530718
Go compiler recently switched to SSA, so these kind of optimizations will just keep improving as they greatly benefit from it. For now SSA is only available on amd64:
> ### Compiler Toolchain
>
> This release includes a new code generation back end for 64-bit x86
> systems, following a proposal from 2015 that has been under development
> since then. The new back end, based on SSA, generates more compact, more
> efficient code and provides a better platform for optimizations such as
> bounds check elimination.
1.8 should have it for all supported architectures:
> ### Compiler Toolchain
>
> Go 1.7 introduced a new compiler back end for 64-bit x86 systems. In Go
> 1.8, that back end has been developed further and is now used for all
> architectures.
答案2
得分: 1
它似乎没有进行优化,并且没有列在“编译器和运行时优化”中。
正如在这个较旧的讨论中提到的,
> gc编译器不进行任何循环优化。该编译器的主要目标之一是快速编译。虽然改进优化总是有用的,但它必须符合这个目标。
但这可能已经改变了。在“Go中的递归和尾调用”中,展示了一种查看go程序生成的汇编代码的好技术。
还可以参考更近期的2016年的“像专业人士一样反向GO二进制文件”文章。
还有“go.godbolt.org”也可以帮助:在这里可以查看汇编代码。
你可以看到“t < cycles*2*math.Pi
”部分始终被评估。
.L2:
movsd xmm0, QWORD PTR [rbp-24]
addsd xmm0, xmm0
movsd xmm1, QWORD PTR .LC2[rip]
mulsd xmm0, xmm1
ucomisd xmm0, QWORD PTR [rbp-8]
seta al
test al, al
英文:
It does not seems optimized, and is not listed in "Compiler And Runtime Optimizations".
As mentioned in this older discussion,
> The gc compiler does not do any loop optimizations. One of the major goals of that compiler is to compile quickly. While improved optimization is always useful, it has to fit within that goal
But that could have changed. A good technique to see what is going on is illustrated in "Recursion And Tail Calls In Go", where you can look at the assembly code produced by a go program.
See also the more recent 2016 "Reversing GO binaries like a pro" article.
And "go.godbolt.org" can help too: see the assembly code here.
You can see the "t < cycles*2*math.Pi
" part always evaluated
.L2:
movsd xmm0, QWORD PTR [rbp-24]
addsd xmm0, xmm0
movsd xmm1, QWORD PTR .LC2[rip]
mulsd xmm0, xmm1
ucomisd xmm0, QWORD PTR [rbp-8]
seta al
test al, al
答案3
得分: 1
当前的Go编译器不会将循环不变计算移到循环外部。
编译器的通行证列表可以在这里查看:https://github.com/golang/go/blob/master/src/cmd/compile/internal/ssa/compile.go#L329
在@creker的示例中,编译器进行了常量折叠,而不是循环不变代码移动。
顺便提一下,几个月前我为Go编译器做了一个LICM通行证,链接在这里:https://github.com/golang/go/compare/master...momchil-velikov:dev.chill.licm
在通常使用的Go基准测试中,这并没有显著提高性能。(我怪罪于糟糕的寄存器分配 :P)
英文:
The current Go compiler does not move loop invariant computations outside loops.
The list of passes of the compiler can be seen here https://github.com/golang/go/blob/master/src/cmd/compile/internal/ssa/compile.go#L329
In the example by @creker, the compiler did constant folding, not loop invariant code motion.
As a side note, I did make a few months ago a LICM pass for the Go compiler https://github.com/golang/go/compare/master...momchil-velikov:dev.chill.licm
which does not improve performance very much on the typically used Go benchmarks. (I blame the atrocious register allocation :P)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论