在Go语言中,非捕获闭包会影响性能吗?

huangapple go评论85阅读模式
英文:

In Go, do non-capturing closures harm performance?

问题

例如,github.com/yhat/scrape建议使用以下闭包:

func someFunc() {
    ...
    matcher := func(n *html.Node) bool {
        return n.DataAtom == atom.Body
    }
    body, ok := scrape.Find(root, matcher)
    ...
}

由于matcher实际上没有捕获任何局部变量,因此可以等效地写成:

func someFunc() {
    ...
    body, ok := scrape.Find(root, matcher)
    ...
}

func matcher(n *html.Node) bool {
    return n.DataAtom == atom.Body
}

第一种形式看起来更好,因为匹配器函数与代码中的特定位置相关。但是在运行时它的性能是否较差(假设someFunc可能经常被调用)?

我猜创建闭包会有一些开销,但这种类型的闭包是否可以被编译器优化为常规函数?

(显然,语言规范并不要求这样做;我对gc实际上的行为感兴趣。)

英文:

For instance, github.com/yhat/scrape suggests using a closure like this:

func someFunc() {
	...
	matcher := func(n *html.Node) bool {
		return n.DataAtom == atom.Body
	}
	body, ok := scrape.Find(root, matcher)
	...
}

Since matcher doesn’t actually capture any local variables, this could equivalently be written as:

func someFunc() {
	...
	body, ok := scrape.Find(root, matcher)
	...
}

func matcher(n *html.Node) bool {
	return n.DataAtom == atom.Body
}

The first form looks better, because the matcher function is quite specific to that place in the code. But does it perform worse at runtime (assuming someFunc may be called often)?

I guess there must be some overhead to creating a closure, but this kind of closure could be optimized into a regular function by the compiler?

(Obviously the language spec doesn’t require this; I’m interested in what gc actually does.)

答案1

得分: 9

看起来它们没有区别。我们可以在生成的机器代码中进行检查。

这是一个玩具程序:

package main

import "fmt"

func topLevelFunction(x int) int {
    return x + 4
}

func useFunction(fn func(int) int) {
    fmt.Println(fn(10))
}

func invoke() {
    innerFunction := func(x int) int {
        return x + 8
    }
    useFunction(topLevelFunction)
    useFunction(innerFunction)
}

func main() {
    invoke()
}

这是它的反汇编代码:

$ go version
go version go1.8.5 linux/amd64

$ go tool objdump -s 'main.(invoke|topLevel)' bin/toy 
TEXT main.topLevelFunction(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
    toy.go:6    0x47b7a0    488b442408    MOVQ 0x8(SP), AX    
    toy.go:6    0x47b7a5    4883c004    ADDQ $0x4, AX        
    toy.go:6    0x47b7a9    4889442410    MOVQ AX, 0x10(SP)    
    toy.go:6    0x47b7ae    c3    RET            

TEXT main.invoke(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
    toy.go:13    0x47b870    64488b0c25f8ffffff    FS MOVQ FS:0xfffffff8, CX        
    toy.go:13    0x47b879    483b6110    CMPQ 0x10(CX), SP            
    toy.go:13    0x47b87d    7638    JBE 0x47b8b7                
    toy.go:13    0x47b87f    4883ec10    SUBQ $0x10, SP                
    toy.go:13    0x47b883    48896c2408    MOVQ BP, 0x8(SP)            
    toy.go:13    0x47b888    488d6c2408    LEAQ 0x8(SP), BP            
    toy.go:17    0x47b88d    488d052cfb0200    LEAQ 0x2fb2c(IP), AX            
    toy.go:17    0x47b894    48890424    MOVQ AX, 0(SP)                
    toy.go:17    0x47b898    e813ffffff    CALL main.useFunction(SB)        
    toy.go:14    0x47b89d    488d0514fb0200    LEAQ 0x2fb14(IP), AX            
    toy.go:18    0x47b8a4    48890424    MOVQ AX, 0(SP)                
    toy.go:18    0x47b8a8    e803ffffff    CALL main.useFunction(SB)        
    toy.go:19    0x47b8ad    488b6c2408    MOVQ 0x8(SP), BP            
    toy.go:19    0x47b8b2    4883c410    ADDQ $0x10, SP                
    toy.go:19    0x47b8b6    c3    RET                    
    toy.go:13    0x47b8b7    e874f7fcff    CALL runtime.morestack_noctxt(SB)    
    toy.go:13    0x47b8bc    ebb2    JMP main.invoke(SB)            

TEXT main.invoke.func1(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
    toy.go:15    0x47b8f0    488b442408    MOVQ 0x8(SP), AX    
    toy.go:15    0x47b8f5    4883c008    ADDQ $0x8, AX        
    toy.go:15    0x47b8f9    4889442410    MOVQ AX, 0x10(SP)    
    toy.go:15    0x47b8fe    c3    RET            

我们可以看到,在这个简单的例子中,topLevelFunctioninnerFunctioninvoke.func1)以及它们传递给 useFunction 的方式在机器代码中没有结构上的区别。

(将此与 innerFunction 捕获局部变量的情况进行比较是有益的;以及将 innerFunction 通过全局变量而不是函数参数传递的情况 - 但这些留作读者的练习。)

英文:

It seems like there is no difference. We can check in the generated machine code.

Here is a toy program:

package main

import "fmt"

func topLevelFunction(x int) int {
    return x + 4
}

func useFunction(fn func(int) int) {
    fmt.Println(fn(10))
}

func invoke() {
    innerFunction := func(x int) int {
        return x + 8
    }
    useFunction(topLevelFunction)
    useFunction(innerFunction)
}

func main() {
    invoke()
}

And here is its disassembly:

$ go version
go version go1.8.5 linux/amd64

$ go tool objdump -s 'main\.(invoke|topLevel)' bin/toy 
TEXT main.topLevelFunction(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
    toy.go:6	0x47b7a0	488b442408	MOVQ 0x8(SP), AX	
    toy.go:6	0x47b7a5	4883c004	ADDQ $0x4, AX		
    toy.go:6	0x47b7a9	4889442410	MOVQ AX, 0x10(SP)	
    toy.go:6	0x47b7ae	c3		RET			

TEXT main.invoke(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
    toy.go:13	0x47b870	64488b0c25f8ffffff	FS MOVQ FS:0xfffffff8, CX		
    toy.go:13	0x47b879	483b6110		CMPQ 0x10(CX), SP			
    toy.go:13	0x47b87d	7638			JBE 0x47b8b7				
    toy.go:13	0x47b87f	4883ec10		SUBQ $0x10, SP				
    toy.go:13	0x47b883	48896c2408		MOVQ BP, 0x8(SP)			
    toy.go:13	0x47b888	488d6c2408		LEAQ 0x8(SP), BP			
    toy.go:17	0x47b88d	488d052cfb0200		LEAQ 0x2fb2c(IP), AX			
    toy.go:17	0x47b894	48890424		MOVQ AX, 0(SP)				
    toy.go:17	0x47b898	e813ffffff		CALL main.useFunction(SB)		
    toy.go:14	0x47b89d	488d0514fb0200		LEAQ 0x2fb14(IP), AX			
    toy.go:18	0x47b8a4	48890424		MOVQ AX, 0(SP)				
    toy.go:18	0x47b8a8	e803ffffff		CALL main.useFunction(SB)		
    toy.go:19	0x47b8ad	488b6c2408		MOVQ 0x8(SP), BP			
    toy.go:19	0x47b8b2	4883c410		ADDQ $0x10, SP				
    toy.go:19	0x47b8b6	c3			RET					
    toy.go:13	0x47b8b7	e874f7fcff		CALL runtime.morestack_noctxt(SB)	
    toy.go:13	0x47b8bc	ebb2			JMP main.invoke(SB)			

TEXT main.invoke.func1(SB) /home/vasiliy/cur/work/learn-go/src/my/toy/toy.go
    toy.go:15	0x47b8f0	488b442408	MOVQ 0x8(SP), AX	
    toy.go:15	0x47b8f5	4883c008	ADDQ $0x8, AX		
    toy.go:15	0x47b8f9	4889442410	MOVQ AX, 0x10(SP)	
    toy.go:15	0x47b8fe	c3		RET			

As we can see, at least in this simple case, there is no structural difference in how topLevelFunction and innerFunction (invoke.func1), and their passing to useFunction, are translated to machine code.

(It is instructive to compare this to the case where innerFunction does capture a local variable; and to the case where, moreover, innerFunction is passed via a global variable rather than a function argument — but these are left as an exercise to the reader.)

答案2

得分: 1

通常情况下是这样的。考虑到编译器优化的影响(因为对函数进行推理通常比对闭包进行推理更容易,所以我期望编译器更倾向于优化函数而不是等效的闭包),可能会更加如此。但是这并不是绝对的,因为许多因素可能会影响生成的最终代码,包括您的平台和编译器版本。更重要的是,您的其他代码通常会对性能产生更大的影响,而不仅仅是调用的速度(无论是从算法角度还是从代码行数角度),这似乎是JimB提出的观点。

例如,我编写了以下示例代码并进行了基准测试。

在我的笔记本电脑上,funcTest每次迭代需要2.0纳秒,closureTest需要2.2纳秒,closureTestLocal需要1.9纳秒。在这里,closureTest与funcTest相比,似乎证实了您(和我)的假设,即闭包调用比函数调用慢。但请注意,这些测试函数故意设计得简单而小,以突出调用速度的差异,而且只有10%的差异。实际上,检查编译器输出显示,在funcTest的情况下,编译器实际上内联了funcTest而不是调用它。因此,如果没有这个内联,我期望差异会更小。但更重要的是,我想指出,尽管closureTestLocal实际上是一个捕获闭包,但它比(内联的)函数快5%。请注意,两个闭包都没有被内联或优化掉 - 两个闭包测试都忠实地进行了所有的调用。我在编译后的代码中唯一看到的区别是局部闭包的情况下完全在堆栈上操作,而其他两个函数都通过地址访问全局变量(在内存中的某个位置)。但是,尽管我可以通过查看编译后的代码来推理出差异,但我的观点是 - 即使在最简单的情况下,这并不是绝对的。

因此,如果速度对您来说真的很重要,我建议进行基准测试(使用实际的代码)。您还可以使用go tool objdump来分析生成的实际代码,以了解差异来自何处。但作为一个经验法则,我建议更专注于编写更好的代码(无论对您来说意味着什么),而忽略实际调用的速度(如“避免过早优化”)。

英文:

It generally should. And probably even more so with compiler optimization taken into account (as reasoning about a function is generally easier then about a closure, so I would expect a compiler to tend to optimize a function more often then an equivalent closure). But it is not exactly black and white as many factors may affect the final code produced, including your platform and version of the compiler itself. And more importantly, your other code will typically affect performance much more then speed of making a call (both algorithm wise and lines of code wise), which seems to be the point JimB made.

For example, I wrote following sample code and then benchmarked it.

var (
	test int64
)

const (
	testThreshold = int64(1000000000)
)

func someFunc() {
	test += 1
}

func funcTest(threshold int64) int64 {
	test = 0
	for i := int64(0); i < threshold; i++ {
		someFunc()
	}
	return test
}

func closureTest(threshold int64) int64 {
	someClosure := func() {
		test += 1
	}

	test = 0
	for i := int64(0); i < threshold; i++ {
		someClosure()
	}
	return test
}

func closureTestLocal(threshold int64) int64 {
	var localTest int64
	localClosure := func() {
		localTest += 1
	}

	localTest = 0
	for i := int64(0); i < threshold; i++ {
		localClosure()
	}
	return localTest
}

On my laptop, funcTest takes 2.0 ns per iteration, closureTest takes 2.2 ns and
closureTestLocal takes 1.9ns. Here, closureTest vs funcTest appears confirming your (and mine) assumption that a closure call will be slower then a function call. But please note that those test functions were intentionally made simple and small to make call speed difference to stand out and it's still only 10% difference. In fact, checking compiler output shows that actually in funcTest case compiler did inline funcTest instead of calling it. So, I would expect the difference be even smaller if it didn't. But more importantly, I'd like to point out that closureTestLocal is 5% faster then the (inlined) function even though this one is actually a capturing closure. Please note that neither of the closures was inlined or optimized out - both closure tests faithfully make all the calls. The only difference I see in the compiled code for local closure case operates completely on the stack, while both other functions access a global variable (somewhere in memory) by it's address. But whilst I easily can reason about the difference by looking at the compiled code, my point is - it's not exactly black and white even in the simplest cases.

So, if speed is really that important in your case, I would suggest benchmarking it instead (and with actual code). You also could use go tool objdump to analyze actual code produced to get a clue where difference comes from. But as a rule of thumb, I would suggest to rather focus on writing better code (whatever that means for you) and ignore speed of actual calls (as in "avoid premature optimization").

答案3

得分: 0

我不认为函数声明的范围会影响性能。此外,在调用中内联lambda表达式是很常见的。我会这样写:

body, ok := scrape.Find(root, func (n *html.Node) bool {return n.DataAtom == atom.Body})
英文:

I don't think scope of function declaration can harm performance. Also it's common to inline lambda in call. I'd write it

body, ok := scrape.Find(root, func (n *html.Node) bool {return n.DataAtom == atom.Body})

huangapple
  • 本文由 发表于 2017年8月29日 19:25:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/45937924.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定