刺激代码内联化

huangapple go评论98阅读模式
英文:

Stimulate code-inlining

问题

与C++等语言不同,在Go语言中,你无法显式地使用inline关键字,而是由编译器动态检测适合进行内联的函数(C++也可以做到,但Go不能同时做到)。此外,Go语言提供了一个调试选项,可以查看可能发生的内联情况,但关于Go编译器的具体逻辑,网上很少有相关文档。

假设我需要在每个n周期内重新运行一些大型循环来处理一组数据;

func Encrypt(password []byte) ([]byte, error) {
    return bcrypt.GenerateFromPassword(password, 13)
}

for id, data := range someDataSet {
    newPassword, _ := Encrypt([]byte("generatedSomething"))
    data["password"] = newPassword
    someSaveCall(id, data)
}

例如,我希望Encrypt函数能够被正确地内联,那么我需要考虑编译器的哪些逻辑呢?

我知道在C++中,通过引用传递参数可以增加自动内联的可能性,而无需使用显式的inline关键字,但很难理解编译器在Go语言中确定是否进行内联的具体决策。例如,像PHP这样的脚本语言,在使用类似于addSomething($a, $b)的常量循环时,与$a + $b(内联)相比,性能损耗几乎是荒谬的。

英文:

Unlike in languages like C++, where you can explicitly state inline, in Go the compiler dynamically detects functions that are candidate for inlining (which C++ can do too, but Go can't do both). Also there's a debug option to see possible inlining happening, yet there is very few documented online about the exact logic of the go compiler(s) doing this.

Let's say I need to rerun some big loop over a set of data every n-period;

func Encrypt(password []byte) ([]byte, error) {
    return bcrypt.GenerateFromPassword(password, 13)
}

for id, data := range someDataSet {
    newPassword, _ := Encrypt([]byte("generatedSomething"))
    data["password"] = newPassword
    someSaveCall(id, data)
}

Aiming for example for Encrypt to being inlined properly what logic should I need to take into consideration for the compiler?

I know from C++ that passing by reference will increase likeliness for automatic inlining without the explicit inline keyword, but it's not very easy to understand what the compiler exactly does to determine the decisions on choosing to inline or not in Go. Scriptlanguages like PHP for example suffer immensely if you do a loop with a constant addSomething($a, $b) where benchmarking such a billion cycles the cost of it versus $a + $b (inline) is almost ridiculous.

答案1

得分: 14

在你没有性能问题之前,你不需要关心这个。无论是内联还是不内联,它都会执行相同的操作。

如果性能确实很重要,并且能够显著地产生差异,那么不要依赖当前(或过去)的内联条件,而是自己进行“内联”(不要将其放在一个单独的函数中)。

规则可以在$GOROOT/src/cmd/compile/internal/inline/inl.go文件中找到。你可以使用'l'调试标志来控制其侵略性。

// 内联功能有两个步骤:首先caninl确定哪些函数适合内联,并为适合内联的函数保存一个副本。然后InlineCalls遍历每个函数体,展开对可内联函数的调用。
//
// Debug.l标志控制侵略性。注意,main()交换了级别0和1,使1成为默认值,-l禁用。额外的级别(超过-l)可能存在错误,并且不受支持。
//      0:禁用
//      1:80个节点的叶子函数,单行函数,panic,延迟类型检查(默认值)
//      2:(未分配)
//      3:(未分配)
//      4:允许非叶子函数
//
// 在某个时候,这可能会有另一个默认值,并且可以使用-N关闭。
//
// -d typcheckinl标志启用了所有导入函数体的早期类型检查,这对于排除错误很有用。
//
// Debug.m标志启用了诊断输出。单个-m用于验证哪些调用被内联或未内联,更多用于调试,并且可能在任何时候消失。

还可以查看博客文章:Dave Cheney - 使Go快速的五个因素(2014-06-07),其中讨论了内联(长篇文章,在中间部分搜索“inline”一词)。

还有关于内联改进的有趣讨论(可能是Go 1.9?):cmd/compile: 改进内联成本模型 #17566

英文:

Until you have performance problems, you shouldn't care. Inlined or not, it will do the same.

If performance does matter and it makes a noticable and significant difference, then don't rely on current (or past) inlining conditions, "inline" it yourself (do not put it in a separate function).

The rules can be found in the $GOROOT/src/cmd/compile/internal/inline/inl.go file. You may control its aggressiveness with the 'l' debug flag.

// The inlining facility makes 2 passes: first caninl determines which
// functions are suitable for inlining, and for those that are it
// saves a copy of the body. Then InlineCalls walks each function body to
// expand calls to inlinable functions.
//
// The Debug.l flag controls the aggressiveness. Note that main() swaps level 0 and 1,
// making 1 the default and -l disable. Additional levels (beyond -l) may be buggy and
// are not supported.
//      0: disabled
//      1: 80-nodes leaf functions, oneliners, panic, lazy typechecking (default)
//      2: (unassigned)
//      3: (unassigned)
//      4: allow non-leaf functions
//
// At some point this may get another default and become switch-offable with -N.
//
// The -d typcheckinl flag enables early typechecking of all imported bodies,
// which is useful to flush out bugs.
//
// The Debug.m flag enables diagnostic output.  a single -m is useful for verifying
// which calls get inlined or not, more is for debugging, and may go away at any point.

Also check out blog post: Dave Cheney - Five things that make Go fast (2014-06-07) which writes about inlining (long post, it's about in the middle, search for the "inline" word).

Also interesting discussion about inlining improvements (maybe Go 1.9?): cmd/compile: improve inlining cost model #17566

答案2

得分: 3

更好的做法是不要猜测,而是进行测量!
你应该相信编译器,避免试图猜测其内部工作方式,因为它会在不同版本之间发生变化。
编译器、CPU或缓存可以采取太多的技巧,以至于无法从源代码中预测性能。

如果内联使得你的代码变得更大,以至于无法再适应缓存行,那么它的速度会比非内联版本慢得多。缓存局部性对性能的影响可能比分支更大。

英文:

Better still, don’t guess, measure!
You should trust the compiler and avoid trying to guess its inner workings as it will change from one version to the next.
There are far too many tricks the compiler, the CPU or the cache can play to be able to predict performance from source code.

What if inlining makes your code bigger to the point that it doesn’t fit in the cache line anymore, making it much slower than the non-inlined version? Cache locality can have a much bigger impact on performance than branching.

答案3

得分: 0

你正在进行一场艰苦的战斗。
Go语言并不适合你所尝试做的事情。
Go语言不可调整,它的设计目标是中等性能。
它更注重简洁而非性能,因此在需要更精确行为(如内联)的情况下,人们不应该使用它。
其他更注重性能的语言提供了内联的API。
你可以尝试一下Rust、C++、C#。

英文:

You are fighting an uphill battle.
Go is not made for what you are trying to do.
Go is not tweakable and it is made for having medium performance.
It values simplicity over performance, therefore people should not use it where you need more precise behavior like inlining.
Languages that value performance more have APIs for inlining.
Check out Rust, C++, C#.

huangapple
  • 本文由 发表于 2016年12月13日 19:15:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/41119734.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定