英文:
In Go, do non-capturing closures harm performance?
问题
例如,github.com/yhat/scrape建议使用以下闭包:
func someFunc() {
...
matcher := func(n *html.Node) bool {
return n.DataAtom == atom.Body
}
body, ok := scrape.Find(root, matcher)
...
}
由于matcher
实际上没有捕获任何局部变量,因此可以等效地写成:
func someFunc() {
...
body, ok := scrape.Find(root, matcher)
...
}
func matcher(n *html.Node) bool {
return n.DataAtom == atom.Body
}
第一种形式看起来更好,因为匹配器函数与代码中的特定位置相关。但是在运行时它的性能是否较差(假设someFunc
可能经常被调用)?
我猜创建闭包会有一些开销,但这种类型的闭包是否可以被编译器优化为常规函数?
(显然,语言规范并不要求这样做;我对gc实际上的行为感兴趣。)
英文:
For instance, github.com/yhat/scrape suggests using a closure like this:
func someFunc() {
...
matcher := func(n *html.Node) bool {
return n.DataAtom == atom.Body
}
body, ok := scrape.Find(root, matcher)
...
}
Since matcher
doesn’t actually capture any local variables, this could equivalently be written as:
func someFunc() {
...
body, ok := scrape.Find(root, matcher)
...
}
func matcher(n *html.Node) bool {
return n.DataAtom == atom.Body
}
The first form looks better, because the matcher function is quite specific to that place in the code. But does it perform worse at runtime (assuming someFunc
may be called often)?
I guess there must be some overhead to creating a closure, but this kind of closure could be optimized into a regular function by the compiler?
(Obviously the language spec doesn’t require this; I’m interested in what gc actually does.)
答案1
得分: 9
看起来它们没有区别。我们可以在生成的机器代码中进行检查。
这是一个玩具程序:
package main
import "fmt"
func topLevelFunction(x int) int {
return x + 4
}
func useFunction(fn func(int) int) {
fmt.Println(fn(10))
}
func invoke() {
innerFunction := func(x int) int {
return x + 8
}
useFunction(topLevelFunction)
useFunction(innerFunction)
}
func main() {
invoke()
}
这是它的反汇编代码:
$ go version
go version go1.8.5 linux/amd64
$ go tool objdump -s 'main.(invoke|topLevel)' bin/toy
TEXT main.topLevelFunction(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
toy.go:6 0x47b7a0 488b442408 MOVQ 0x8(SP), AX
toy.go:6 0x47b7a5 4883c004 ADDQ $0x4, AX
toy.go:6 0x47b7a9 4889442410 MOVQ AX, 0x10(SP)
toy.go:6 0x47b7ae c3 RET
TEXT main.invoke(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
toy.go:13 0x47b870 64488b0c25f8ffffff FS MOVQ FS:0xfffffff8, CX
toy.go:13 0x47b879 483b6110 CMPQ 0x10(CX), SP
toy.go:13 0x47b87d 7638 JBE 0x47b8b7
toy.go:13 0x47b87f 4883ec10 SUBQ $0x10, SP
toy.go:13 0x47b883 48896c2408 MOVQ BP, 0x8(SP)
toy.go:13 0x47b888 488d6c2408 LEAQ 0x8(SP), BP
toy.go:17 0x47b88d 488d052cfb0200 LEAQ 0x2fb2c(IP), AX
toy.go:17 0x47b894 48890424 MOVQ AX, 0(SP)
toy.go:17 0x47b898 e813ffffff CALL main.useFunction(SB)
toy.go:14 0x47b89d 488d0514fb0200 LEAQ 0x2fb14(IP), AX
toy.go:18 0x47b8a4 48890424 MOVQ AX, 0(SP)
toy.go:18 0x47b8a8 e803ffffff CALL main.useFunction(SB)
toy.go:19 0x47b8ad 488b6c2408 MOVQ 0x8(SP), BP
toy.go:19 0x47b8b2 4883c410 ADDQ $0x10, SP
toy.go:19 0x47b8b6 c3 RET
toy.go:13 0x47b8b7 e874f7fcff CALL runtime.morestack_noctxt(SB)
toy.go:13 0x47b8bc ebb2 JMP main.invoke(SB)
TEXT main.invoke.func1(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
toy.go:15 0x47b8f0 488b442408 MOVQ 0x8(SP), AX
toy.go:15 0x47b8f5 4883c008 ADDQ $0x8, AX
toy.go:15 0x47b8f9 4889442410 MOVQ AX, 0x10(SP)
toy.go:15 0x47b8fe c3 RET
我们可以看到,在这个简单的例子中,topLevelFunction
和 innerFunction
(invoke.func1
)以及它们传递给 useFunction
的方式在机器代码中没有结构上的区别。
(将此与 innerFunction
捕获局部变量的情况进行比较是有益的;以及将 innerFunction
通过全局变量而不是函数参数传递的情况 - 但这些留作读者的练习。)
英文:
It seems like there is no difference. We can check in the generated machine code.
Here is a toy program:
package main
import "fmt"
func topLevelFunction(x int) int {
return x + 4
}
func useFunction(fn func(int) int) {
fmt.Println(fn(10))
}
func invoke() {
innerFunction := func(x int) int {
return x + 8
}
useFunction(topLevelFunction)
useFunction(innerFunction)
}
func main() {
invoke()
}
And here is its disassembly:
$ go version
go version go1.8.5 linux/amd64
$ go tool objdump -s 'main\.(invoke|topLevel)' bin/toy
TEXT main.topLevelFunction(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
toy.go:6 0x47b7a0 488b442408 MOVQ 0x8(SP), AX
toy.go:6 0x47b7a5 4883c004 ADDQ $0x4, AX
toy.go:6 0x47b7a9 4889442410 MOVQ AX, 0x10(SP)
toy.go:6 0x47b7ae c3 RET
TEXT main.invoke(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
toy.go:13 0x47b870 64488b0c25f8ffffff FS MOVQ FS:0xfffffff8, CX
toy.go:13 0x47b879 483b6110 CMPQ 0x10(CX), SP
toy.go:13 0x47b87d 7638 JBE 0x47b8b7
toy.go:13 0x47b87f 4883ec10 SUBQ $0x10, SP
toy.go:13 0x47b883 48896c2408 MOVQ BP, 0x8(SP)
toy.go:13 0x47b888 488d6c2408 LEAQ 0x8(SP), BP
toy.go:17 0x47b88d 488d052cfb0200 LEAQ 0x2fb2c(IP), AX
toy.go:17 0x47b894 48890424 MOVQ AX, 0(SP)
toy.go:17 0x47b898 e813ffffff CALL main.useFunction(SB)
toy.go:14 0x47b89d 488d0514fb0200 LEAQ 0x2fb14(IP), AX
toy.go:18 0x47b8a4 48890424 MOVQ AX, 0(SP)
toy.go:18 0x47b8a8 e803ffffff CALL main.useFunction(SB)
toy.go:19 0x47b8ad 488b6c2408 MOVQ 0x8(SP), BP
toy.go:19 0x47b8b2 4883c410 ADDQ $0x10, SP
toy.go:19 0x47b8b6 c3 RET
toy.go:13 0x47b8b7 e874f7fcff CALL runtime.morestack_noctxt(SB)
toy.go:13 0x47b8bc ebb2 JMP main.invoke(SB)
TEXT main.invoke.func1(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
toy.go:15 0x47b8f0 488b442408 MOVQ 0x8(SP), AX
toy.go:15 0x47b8f5 4883c008 ADDQ $0x8, AX
toy.go:15 0x47b8f9 4889442410 MOVQ AX, 0x10(SP)
toy.go:15 0x47b8fe c3 RET
As we can see, at least in this simple case, there is no structural difference in how topLevelFunction
and innerFunction
(invoke.func1
), and their passing to useFunction
, are translated to machine code.
(It is instructive to compare this to the case where innerFunction
does capture a local variable; and to the case where, moreover, innerFunction
is passed via a global variable rather than a function argument — but these are left as an exercise to the reader.)
答案2
得分: 1
通常情况下是这样的。考虑到编译器优化的影响(因为对函数进行推理通常比对闭包进行推理更容易,所以我期望编译器更倾向于优化函数而不是等效的闭包),可能会更加如此。但是这并不是绝对的,因为许多因素可能会影响生成的最终代码,包括您的平台和编译器版本。更重要的是,您的其他代码通常会对性能产生更大的影响,而不仅仅是调用的速度(无论是从算法角度还是从代码行数角度),这似乎是JimB提出的观点。
例如,我编写了以下示例代码并进行了基准测试。
在我的笔记本电脑上,funcTest每次迭代需要2.0纳秒,closureTest需要2.2纳秒,closureTestLocal需要1.9纳秒。在这里,closureTest与funcTest相比,似乎证实了您(和我)的假设,即闭包调用比函数调用慢。但请注意,这些测试函数故意设计得简单而小,以突出调用速度的差异,而且只有10%的差异。实际上,检查编译器输出显示,在funcTest的情况下,编译器实际上内联了funcTest而不是调用它。因此,如果没有这个内联,我期望差异会更小。但更重要的是,我想指出,尽管closureTestLocal实际上是一个捕获闭包,但它比(内联的)函数快5%。请注意,两个闭包都没有被内联或优化掉 - 两个闭包测试都忠实地进行了所有的调用。我在编译后的代码中唯一看到的区别是局部闭包的情况下完全在堆栈上操作,而其他两个函数都通过地址访问全局变量(在内存中的某个位置)。但是,尽管我可以通过查看编译后的代码来推理出差异,但我的观点是 - 即使在最简单的情况下,这并不是绝对的。
因此,如果速度对您来说真的很重要,我建议进行基准测试(使用实际的代码)。您还可以使用go tool objdump
来分析生成的实际代码,以了解差异来自何处。但作为一个经验法则,我建议更专注于编写更好的代码(无论对您来说意味着什么),而忽略实际调用的速度(如“避免过早优化”)。
英文:
It generally should. And probably even more so with compiler optimization taken into account (as reasoning about a function is generally easier then about a closure, so I would expect a compiler to tend to optimize a function more often then an equivalent closure). But it is not exactly black and white as many factors may affect the final code produced, including your platform and version of the compiler itself. And more importantly, your other code will typically affect performance much more then speed of making a call (both algorithm wise and lines of code wise), which seems to be the point JimB made.
For example, I wrote following sample code and then benchmarked it.
var (
test int64
)
const (
testThreshold = int64(1000000000)
)
func someFunc() {
test += 1
}
func funcTest(threshold int64) int64 {
test = 0
for i := int64(0); i < threshold; i++ {
someFunc()
}
return test
}
func closureTest(threshold int64) int64 {
someClosure := func() {
test += 1
}
test = 0
for i := int64(0); i < threshold; i++ {
someClosure()
}
return test
}
func closureTestLocal(threshold int64) int64 {
var localTest int64
localClosure := func() {
localTest += 1
}
localTest = 0
for i := int64(0); i < threshold; i++ {
localClosure()
}
return localTest
}
On my laptop, funcTest takes 2.0 ns per iteration, closureTest takes 2.2 ns and
closureTestLocal takes 1.9ns. Here, closureTest vs funcTest appears confirming your (and mine) assumption that a closure call will be slower then a function call. But please note that those test functions were intentionally made simple and small to make call speed difference to stand out and it's still only 10% difference. In fact, checking compiler output shows that actually in funcTest case compiler did inline funcTest instead of calling it. So, I would expect the difference be even smaller if it didn't. But more importantly, I'd like to point out that closureTestLocal is 5% faster then the (inlined) function even though this one is actually a capturing closure. Please note that neither of the closures was inlined or optimized out - both closure tests faithfully make all the calls. The only difference I see in the compiled code for local closure case operates completely on the stack, while both other functions access a global variable (somewhere in memory) by it's address. But whilst I easily can reason about the difference by looking at the compiled code, my point is - it's not exactly black and white even in the simplest cases.
So, if speed is really that important in your case, I would suggest benchmarking it instead (and with actual code). You also could use go tool objdump
to analyze actual code produced to get a clue where difference comes from. But as a rule of thumb, I would suggest to rather focus on writing better code (whatever that means for you) and ignore speed of actual calls (as in "avoid premature optimization").
答案3
得分: 0
我不认为函数声明的范围会影响性能。此外,在调用中内联lambda表达式是很常见的。我会这样写:
body, ok := scrape.Find(root, func (n *html.Node) bool {return n.DataAtom == atom.Body})
英文:
I don't think scope of function declaration can harm performance. Also it's common to inline lambda in call. I'd write it
body, ok := scrape.Find(root, func (n *html.Node) bool {return n.DataAtom == atom.Body})
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论